Using Rakudo Grammars

Chris Dolan, madison.pm

cdolan@cpan.org

February 7, 2009

permalink: http://chrisdolan.net/frozenperl/rakudo-grammars.html

Roadmap

  1. Introduce Perl 6 grammars
  2. Perl 6 grammar examples
  3. Describe PDF internals
  4. Show PDF grammar

Perl 6 grammars?

A collection of Perl 6 regexes that can invoke each other to accomplish a larger parse.

A grammar is a special kind of class (it has inheritance, is instantiable, has methods, etc.)

Perl 6 vs. Perl 5?

Example

Consider an address of the form:
100 E Main St Springfield MA 01234

Example

Consider an address of the form:
100 E Main St Springfield MA 01234

     m/
         ([1-9]\d*)\s+
         ((?:N|S|E|W)\s+)?
         (\w+(?:\s+\w+)*)\s+
         (ave|ln|st|rd)\s+
         ([:alpha:]+(?:\s+[:alpha:]+)*)\s+
         ([A-Z]{2})\s+
         (\d{5}(?:-\d{4})?)
      /ix;
 

Example

Consider an address of the form:
100 E Main St Springfield MA 01234

  grammar USMailAddress {
     rule  TOP { <addr> <city> <state> <zip> }

     rule  addr { <[1..9]>\d* <direction>?
                  <streetname> <streettype> }
     token direction { N | S | E | W }
     token streetname { \w+ [ \s+ \w+ ]* }
     token streettype {:i ave | ln | rd | st }
     token city { <alpha> [ \s+ <alpha> ]* }
     token state { <[A..Z]>**{2} }
     token zip { \d**{5} [ - \d**{4} ]? }
  }
 

Using a grammar

  my $match =
       $addr ~~ m/^<USMailAddress::TOP>$/;
  if $match {
     say "$match<city>, $match<state>";
  }
 
or, more tersely:
  if $addr ~~ m/^<USMailAddress::TOP>$/ {
     say "$<city>, $<state>";
  }
 
or, more deeply:
     say "$<addr><direction>";
 

Embedded code

A code block is a zero-width assertion that executes arbitrary code

  token boolean {
     [true | false]
         || { die "That's not a boolean!" }
  }
 

Blocks can refer to previous captures

  m/ (\d**{1..3}) { $0 < 256 or fail } /
 

Modifying match results

You can alter what is returned in the match results via 'make'.

Silly example:

  token street_abbrev {
    | Rd   { make 'Road' }
    | St   { make 'Street' }
    | Blvd { make 'Boulevard' }
  }
  $m = 'Rd' ~~ m/<street_abbrev>/;
  say $m<street_abbrev>;  # 'Road'
 

Modifying match results

Legitimate example:

  rule foldable_constant {
     (\d+) [
       | + (\d+) { make $0 + $1 }
       | - (\d+) { make $0 - $1 }
       | * (\d+) { make $0 * $1 }
       | / (\d+) { make $0 / $1 }
     ]
  }

  '1 + 3' ~~ m/<foldable_constant>/;
  say $($/);  # '4'
 

PDF internals

%PDF-1.4
1 0 obj
<< /Type /Catalog /Metadata 3 0 R /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages /Count 1 /Kids [ 5 0 R ] >>
endobj
3 0 obj
<< /CreationDate (D:20071111223720Z)
   /Creator (Adobe Illustrator 10) >>
endobj
5 0 obj
<< /Type /Page /ArtBox [ 135 603.67383 434.4668 679 ]
   /Contents 74 0 R /MediaBox [ 0 0 612 792 ] /Parent 2 0 R
   /Resources << /ColorSpace << /CS0 66 0 R /CS1 67 0 R >>
   /Font << /TT0 68 0 R >> /ProcSet [ /PDF /Text ] >>
   /Thumb 72 0 R /TrimBox [ 0 0 612 792 ] >>
endobj

PDF parsing

   rule  dict      { '<<' ( <name> <any> )* '>>' }
   rule  array     { '[' ( <any> )* ']' }
   token name      { '/' \w+ }
   rule  reference { \d+ \d+ R }
   token string    { '(' <-[()]> ')' }
   token hexstring { '<' <[\d a..f A..F]>* '>' }
   token boolean   { [true | false] }
   token null      { null }
   token number    { [ + | - ]? [\d+ ['.'\d*]? | '.'\d+ ] }
   rule any {
      | <dict> | <array> | <reference>
      | <number> | <string> | <hexstring>
      | <name> | <boolean> | <null>
   }
   rule object {
      \d+ \d+ obj <any> <stream>? endobj
   }

PDF parsing

   rule  dict      { '<<' ( <name> <any> )* '>>' {*}}
   rule  array     { '[' ( <any> )* ']' {*}}
   token name      { '/' \w+ {*}}
   rule  reference { \d+ \d+ R {*}}
   token string    { '(' <-[()]> ')' {*}}
   token hexstring { '<' <[\d a..f A..F]>* '>' {*}}
   token boolean   { [true | false] {*}}
   token null      { null {*}}
   token number    { [ + | - ]? [\d+ ['.'\d*]? | '.'\d+ ] {*}}
   rule any {
      | <dict> | <array> | <reference>
      | <number> | <string> | <hexstring>
      | <name> | <boolean> | <null>
   }
   rule object {
      \d+ \d+ obj <any> <stream>? endobj
   }

PDF instantiation

   method name($m) {
      make PDF::Syntax::Name.new(:name($($m<name_label>)));
   }
   method number($m) {
      make PDF::Syntax::Number.new(:val(+$m));
   }
   method string($m) {
      make PDF::Syntax::String.new(:val($($m<literal_str>)));
   }
   method boolean($m) {
      make $m eq 'true' ?? $PDF::Syntax::Boolean::TRUE
                        !! $PDF::Syntax::Boolean::FALSE);
   }
   method null($m) {
      make $PDF::Syntax::Null::NULL;
   }
   method any($m, $key) {
      make $($m{$key});
   }

PDF instantiation

   method dict($m) {
      my %val;
      for @($m).list -> $i {
         %val{~$($i<name>)} = $($i<any>);
      }
      make PDF::Syntax::Dictionary.new(:val(%val));
   }
   method array($m) {
      my @val;
      for @($m).list -> $i {
         @val.push($($i<any>));
      }
      make PDF::Syntax::Array.new(:val(@val));
   }
   method object($m) {
      my $val = $($m<any>);
      if ($m<stream> && !$val.isa(PDF::Syntax::Dictionary)) {
         die 'Stream must be prefixed by a dictionary';
      }
      make PDF::Syntax::Object.new(:val($val),
                                   :stream($($m<stream>)));
   }