A collection of Perl 6 regexes that can invoke each other to accomplish a larger parse.
A grammar is a special kind of class (it has inheritance, is instantiable, has methods, etc.)
Consider an address of the form:
100 E Main St Springfield MA 01234
Consider an address of the form:
100 E Main St Springfield MA 01234
m/ ([1-9]\d*)\s+ ((?:N|S|E|W)\s+)? (\w+(?:\s+\w+)*)\s+ (ave|ln|st|rd)\s+ ([:alpha:]+(?:\s+[:alpha:]+)*)\s+ ([A-Z]{2})\s+ (\d{5}(?:-\d{4})?) /ix;
Consider an address of the form:
100 E Main St Springfield MA 01234
grammar USMailAddress { rule TOP { <addr> <city> <state> <zip> } rule addr { <[1..9]>\d* <direction>? <streetname> <streettype> } token direction { N | S | E | W } token streetname { \w+ [ \s+ \w+ ]* } token streettype {:i ave | ln | rd | st } token city { <alpha> [ \s+ <alpha> ]* } token state { <[A..Z]>**{2} } token zip { \d**{5} [ - \d**{4} ]? } }
my $match = $addr ~~ m/^<USMailAddress::TOP>$/; if $match { say "$match<city>, $match<state>"; }or, more tersely:
if $addr ~~ m/^<USMailAddress::TOP>$/ { say "$<city>, $<state>"; }or, more deeply:
say "$<addr><direction>";
A code block is a zero-width assertion that executes arbitrary code
token boolean { [true | false] || { die "That's not a boolean!" } }
Blocks can refer to previous captures
m/ (\d**{1..3}) { $0 < 256 or fail } /
You can alter what is returned in the match results via 'make'.
Silly example:
token street_abbrev { | Rd { make 'Road' } | St { make 'Street' } | Blvd { make 'Boulevard' } } $m = 'Rd' ~~ m/<street_abbrev>/; say $m<street_abbrev>; # 'Road'
Legitimate example:
rule foldable_constant { (\d+) [ | + (\d+) { make $0 + $1 } | - (\d+) { make $0 - $1 } | * (\d+) { make $0 * $1 } | / (\d+) { make $0 / $1 } ] } '1 + 3' ~~ m/<foldable_constant>/; say $($/); # '4'
%PDF-1.4 1 0 obj << /Type /Catalog /Metadata 3 0 R /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Count 1 /Kids [ 5 0 R ] >> endobj 3 0 obj << /CreationDate (D:20071111223720Z) /Creator (Adobe Illustrator 10) >> endobj 5 0 obj << /Type /Page /ArtBox [ 135 603.67383 434.4668 679 ] /Contents 74 0 R /MediaBox [ 0 0 612 792 ] /Parent 2 0 R /Resources << /ColorSpace << /CS0 66 0 R /CS1 67 0 R >> /Font << /TT0 68 0 R >> /ProcSet [ /PDF /Text ] >> /Thumb 72 0 R /TrimBox [ 0 0 612 792 ] >> endobj
rule dict { '<<' ( <name> <any> )* '>>' } rule array { '[' ( <any> )* ']' } token name { '/' \w+ } rule reference { \d+ \d+ R } token string { '(' <-[()]> ')' } token hexstring { '<' <[\d a..f A..F]>* '>' } token boolean { [true | false] } token null { null } token number { [ + | - ]? [\d+ ['.'\d*]? | '.'\d+ ] } rule any { | <dict> | <array> | <reference> | <number> | <string> | <hexstring> | <name> | <boolean> | <null> } rule object { \d+ \d+ obj <any> <stream>? endobj }
rule dict { '<<' ( <name> <any> )* '>>' {*}} rule array { '[' ( <any> )* ']' {*}} token name { '/' \w+ {*}} rule reference { \d+ \d+ R {*}} token string { '(' <-[()]> ')' {*}} token hexstring { '<' <[\d a..f A..F]>* '>' {*}} token boolean { [true | false] {*}} token null { null {*}} token number { [ + | - ]? [\d+ ['.'\d*]? | '.'\d+ ] {*}} rule any { | <dict> | <array> | <reference> | <number> | <string> | <hexstring> | <name> | <boolean> | <null> } rule object { \d+ \d+ obj <any> <stream>? endobj }
method name($m) { make PDF::Syntax::Name.new(:name($($m<name_label>))); } method number($m) { make PDF::Syntax::Number.new(:val(+$m)); } method string($m) { make PDF::Syntax::String.new(:val($($m<literal_str>))); } method boolean($m) { make $m eq 'true' ?? $PDF::Syntax::Boolean::TRUE !! $PDF::Syntax::Boolean::FALSE); } method null($m) { make $PDF::Syntax::Null::NULL; } method any($m, $key) { make $($m{$key}); }
method dict($m) { my %val; for @($m).list -> $i { %val{~$($i<name>)} = $($i<any>); } make PDF::Syntax::Dictionary.new(:val(%val)); } method array($m) { my @val; for @($m).list -> $i { @val.push($($i<any>)); } make PDF::Syntax::Array.new(:val(@val)); } method object($m) { my $val = $($m<any>); if ($m<stream> && !$val.isa(PDF::Syntax::Dictionary)) { die 'Stream must be prefixed by a dictionary'; } make PDF::Syntax::Object.new(:val($val), :stream($($m<stream>))); }