A collection of Perl 6 regexes that can invoke each other to accomplish a larger parse.
A grammar is a special kind of class (it has inheritance, is instantiable, has methods, etc.)
Consider an address of the form:
100 E Main St Springfield MA 01234
Consider an address of the form:
100 E Main St Springfield MA 01234
m/
([1-9]\d*)\s+
((?:N|S|E|W)\s+)?
(\w+(?:\s+\w+)*)\s+
(ave|ln|st|rd)\s+
([:alpha:]+(?:\s+[:alpha:]+)*)\s+
([A-Z]{2})\s+
(\d{5}(?:-\d{4})?)
/ix;
Consider an address of the form:
100 E Main St Springfield MA 01234
grammar USMailAddress {
rule TOP { <addr> <city> <state> <zip> }
rule addr { <[1..9]>\d* <direction>?
<streetname> <streettype> }
token direction { N | S | E | W }
token streetname { \w+ [ \s+ \w+ ]* }
token streettype {:i ave | ln | rd | st }
token city { <alpha> [ \s+ <alpha> ]* }
token state { <[A..Z]>**{2} }
token zip { \d**{5} [ - \d**{4} ]? }
}
my $match =
$addr ~~ m/^<USMailAddress::TOP>$/;
if $match {
say "$match<city>, $match<state>";
}
or, more tersely:
if $addr ~~ m/^<USMailAddress::TOP>$/ {
say "$<city>, $<state>";
}
or, more deeply:
say "$<addr><direction>";
A code block is a zero-width assertion that executes arbitrary code
token boolean {
[true | false]
|| { die "That's not a boolean!" }
}
Blocks can refer to previous captures
m/ (\d**{1..3}) { $0 < 256 or fail } /
You can alter what is returned in the match results via 'make'.
Silly example:
token street_abbrev {
| Rd { make 'Road' }
| St { make 'Street' }
| Blvd { make 'Boulevard' }
}
$m = 'Rd' ~~ m/<street_abbrev>/;
say $m<street_abbrev>; # 'Road'
Legitimate example:
rule foldable_constant {
(\d+) [
| + (\d+) { make $0 + $1 }
| - (\d+) { make $0 - $1 }
| * (\d+) { make $0 * $1 }
| / (\d+) { make $0 / $1 }
]
}
'1 + 3' ~~ m/<foldable_constant>/;
say $($/); # '4'
%PDF-1.4 1 0 obj << /Type /Catalog /Metadata 3 0 R /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Count 1 /Kids [ 5 0 R ] >> endobj 3 0 obj << /CreationDate (D:20071111223720Z) /Creator (Adobe Illustrator 10) >> endobj 5 0 obj << /Type /Page /ArtBox [ 135 603.67383 434.4668 679 ] /Contents 74 0 R /MediaBox [ 0 0 612 792 ] /Parent 2 0 R /Resources << /ColorSpace << /CS0 66 0 R /CS1 67 0 R >> /Font << /TT0 68 0 R >> /ProcSet [ /PDF /Text ] >> /Thumb 72 0 R /TrimBox [ 0 0 612 792 ] >> endobj
rule dict { '<<' ( <name> <any> )* '>>' }
rule array { '[' ( <any> )* ']' }
token name { '/' \w+ }
rule reference { \d+ \d+ R }
token string { '(' <-[()]> ')' }
token hexstring { '<' <[\d a..f A..F]>* '>' }
token boolean { [true | false] }
token null { null }
token number { [ + | - ]? [\d+ ['.'\d*]? | '.'\d+ ] }
rule any {
| <dict> | <array> | <reference>
| <number> | <string> | <hexstring>
| <name> | <boolean> | <null>
}
rule object {
\d+ \d+ obj <any> <stream>? endobj
}
rule dict { '<<' ( <name> <any> )* '>>' {*}}
rule array { '[' ( <any> )* ']' {*}}
token name { '/' \w+ {*}}
rule reference { \d+ \d+ R {*}}
token string { '(' <-[()]> ')' {*}}
token hexstring { '<' <[\d a..f A..F]>* '>' {*}}
token boolean { [true | false] {*}}
token null { null {*}}
token number { [ + | - ]? [\d+ ['.'\d*]? | '.'\d+ ] {*}}
rule any {
| <dict> | <array> | <reference>
| <number> | <string> | <hexstring>
| <name> | <boolean> | <null>
}
rule object {
\d+ \d+ obj <any> <stream>? endobj
}
method name($m) {
make PDF::Syntax::Name.new(:name($($m<name_label>)));
}
method number($m) {
make PDF::Syntax::Number.new(:val(+$m));
}
method string($m) {
make PDF::Syntax::String.new(:val($($m<literal_str>)));
}
method boolean($m) {
make $m eq 'true' ?? $PDF::Syntax::Boolean::TRUE
!! $PDF::Syntax::Boolean::FALSE);
}
method null($m) {
make $PDF::Syntax::Null::NULL;
}
method any($m, $key) {
make $($m{$key});
}
method dict($m) {
my %val;
for @($m).list -> $i {
%val{~$($i<name>)} = $($i<any>);
}
make PDF::Syntax::Dictionary.new(:val(%val));
}
method array($m) {
my @val;
for @($m).list -> $i {
@val.push($($i<any>));
}
make PDF::Syntax::Array.new(:val(@val));
}
method object($m) {
my $val = $($m<any>);
if ($m<stream> && !$val.isa(PDF::Syntax::Dictionary)) {
die 'Stream must be prefixed by a dictionary';
}
make PDF::Syntax::Object.new(:val($val),
:stream($($m<stream>)));
}