PAR + FUSE + PDF

Chris Dolan

Equilibrious LLC

cdolan@cpan.org

June 16, 2008

permalink: http://chrisdolan.net/yapcna2008/par-fuse-pdf.html

Roadmap



  1. Demonstrate filesystem-in-PDF
  2. Define PAR
  3. Define FUSE
  4. Describe PDF implementation
  5. Show FUSE code
  6. Details of PAR

Demonstration...

This application is a Mac Cocoa GUI front end for the CPAN module Fuse::PDF. You can hand it most any PDF file and it will mount it in /Volumes.

The command line version is much more powerful.

What is PAR?

 
 
 
 

What is FUSE?

Filesystem in Userspace

What is FUSE?

Perl wrapper

What's special about PDF?

PDF internals - Catalog, Pages

%PDF-1.4
1 0 obj
<< /Type /Catalog /FusePDF << /FusePDF_FS 79 0 R >>
   /Metadata 78 0 R /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages /Count 1 /Kids [ 5 0 R ] >>
endobj
3 0 obj
<< /CreationDate (D:20071111223720Z) /Creator (Adobe Illustrator 10)
   /ModDate (D:20071111163806-06'00') /Producer (Adobe PDF library 5.00) >>
endobj
5 0 obj
<< /Type /Page /ArtBox [ 135 603.67383 434.4668 679 ]
   /Contents 74 0 R /MediaBox [ 0 0 612 792 ] /Parent 2 0 R
   /Resources << /ColorSpace << /CS0 66 0 R /CS1 67 0 R >>
   /Font << /TT0 68 0 R >> /ProcSet [ /PDF /Text ] >>
   /Thumb 72 0 R /TrimBox [ 0 0 612 792 ] >>
endobj

PDF internals - Trailer

trailer
<< /ID [ <3ea45250c17a85697af93ca7662ae46f>
<b6e75d95e111a3302f3c74228156deb2> ]
/Info 3 0 R /Root 1 0 R /Size 80 >>
startxref
265166
%%EOF

PDF internals - Crossref table

xref
0 80
0000000000 65535 f 
0000000012 00000 n 
0000000112 00000 n 
0000000171 00000 n 
0000000007 00001 f 
0000000328 00000 n 
0000000694 00000 n 
0000000008 00001 f 
0000000009 00001 f 
0000000010 00001 f 
0000000011 00001 f 
0000000012 00001 f 
0000000013 00001 f 
0000000014 00001 f 
0000000015 00001 f 

FUSE in Perl

use Fuse;
Fuse::main(
   mountpoint => '/Volumes/mnt',
   getattr  => \&fs_getattr,    readlink => \&fs_readlink,
   getdir   => \&fs_getdir,     mknod    => \&fs_mknod,
   mkdir    => \&fs_mkdir,      unlink   => \&fs_unlink,
   rmdir    => \&fs_rmdir,      symlink  => \&fs_symlink,
   rename   => \&fs_rename,     link     => \&fs_link,
   chmod    => \&fs_chmod,      chown    => \&fs_chown,
   truncate => \&fs_truncate,   utime    => \&fs_utime,
   open     => \&fs_open,       read     => \&fs_read,
   write    => \&fs_write,      statfs   => \&fs_statfs,
   threaded => 0, debug => 1);

FUSE in Perl

sub fs_read {
   my ($path, $size, $offset) = @_;
   my ($f) = _parse_path($path);
   return -$f if !ref $f;
   return substr $f->{content}, $offset, $size;
}

FUSE in Perl

sub fs_rmdir {
   my ($path) = @_;
   my ($p, $name) = _parse_path_to_parent($path);
   return -$p if !ref $p;
   my $f = $p->{files}->{$name};
   return -ENOENT() if !ref $f;
   return -ENOTDIR() if 'd' ne $f->{type};
   return -ENOTEMPTY() if 0 != keys %{ $f->{files} };
   delete $p->{files}->{$name};
   $p->{nlink}--;
   $p->{mtime} = time;
   return 0;
}

PAR in depth

A .par file is just a .zip file with a MANIFEST file that lists all of the contents (for JAR compatibility) and a META.yml file that includes the name of the default program to start first.

A PAR file can be passive, like JAR, or it may include a executable header.

When launched, a PAR executable unpacks itself a little bit into /tmp, then runs the unpacked Perl interpreter to finish unpacking.

PAR in depth, cont.

By default, PAR files unpack into subdirs named with the user's name and the checksum of the PAR itself, so repeated runs launch faster.

% ls -R /tmp/par-chris
par-chris:
cache-42666fcb13a1ae316ad5f97f1b1f7d4f6daa7aeb

par-chris/cache-42666fcb13a1ae316ad5f97f1b1f7d4f6daa7aeb:
015ec12a.pm
04b64db8.pm
05390090.pm
11d0210d.pm
168023f8.bundle
16d5408c.bundle
...
 

PAR benefits for end-users

Perl and source embedded in an executable:

PAR drawbacks

PAR usage

PAR created with pp

% pp hello.pl               # 'hello.pl' -> 'a.out'
% pp -o hello hello.pl      # 'hello.pl' -> 'hello'
                            # (or 'hello.exe' on Win32)
% pp -o hello -e 'print "Hello, World!\n"'
% pp -p -o hello.par -e 'print "Hello, World!\n"'

% pp -I ./lib hello         # Extra include paths
% pp -M Foo::Bar hello      # Extra modules and deps
% pp -X Foo::Bar hello      # Exclude modules
% pp -a data.txt hello      # Additional data files

# Win32 special features
% pp --gui --icon hello.ico -o hello hello.pl
 

PAR usage

% pp -M Fuse::PDF -o mount_pdf.i386 bin/mount_pdf
% zipinfo mount_pdf.i386
drwxr-xr-x       0 b- stor 13-Dec-07 02:12 lib/
drwxr-xr-x       0 b- stor 13-Dec-07 02:12 script/
-rw-r--r--   21565 b- defN 13-Dec-07 02:12 MANIFEST
-rw-r--r--     233 b- defN 13-Dec-07 02:12 META.yml
-rw-r--r--    5672 b- defN 13-Dec-07 02:12 lib/AutoLoader.pm
-rw-r--r--    7302 b- defN 13-Dec-07 02:12 lib/B.pm
-rw-r--r--  123818 b- defN 13-Dec-07 02:12 lib/B/Deparse.pm
-rw-r--r--  118403 b- defN 13-Dec-07 02:12 lib/CAM/PDF.pm
...
-rw-r--r--     787 b- defN 13-Dec-07 02:12 lib/warnings/register.pm
-rw-r--r--     536 b- defN 13-Dec-07 02:12 script/main.pl
-rw-r--r--    2960 b- defN 13-Dec-07 02:12 script/mount_pdf
777 files, 4519959 bytes uncompressed, 1497161 bytes compressed:  66.9%
% ls -hs mount_pdf.i386
2.8M mount_pdf.i386

Conclusion