Category Archives: Perl

CAM::PDF v1.54 fixes appendPDF bug

I maintain the open source CAM::PDF Perl library in my free time. This library, originally authored by me at Clotho Advanced Media starting in 2002, is a high-performance low-level PDF editing tool. It doesn’t have support for sophisticated authoring tasks (see PDF::API2 for that!) but it is good for utility work like concatenating two PDFs together or deleting pages from a document or encrypting a PDF.

I just fixed a bug where appending a big PDF to a small one (“big” in terms of number of internal objects, which correlates with page count or byte count but is subtly different) often went wrong because I generated object ID numbers by simply incrementing a counter in the small PDF. Sometimes, that counter matched IDs in the bigger PDF, but those IDs are supposed to be unique per document, so things went badly. This bug is now fixed by simply taking the max ID of the two docs as the new counter value before incrementing to make a new ID number.

I never stumbled on the bug in my own work because I always appended (or prepended) the smaller doc to the larger one for performance reasons. CAM::PDF v1.54 is on it’s way to CPAN as I type this.

I’m grateful to Charlie Katz of the Harvard-Smithsonian Center for Astrophysics for providing me with a simple test case that exhibits the problem!

FLV to SWF and back

During the past couple of weeks, I’ve been working on a library that can convert SWF files to FLV files and vice versa. I’ve finally succeeded and release v0.10 tonight! If you are a media expert (or amateur!) I’d love some help testing this. Currently, I’ve only tested on Mac with FLV and SWF files generated by the On2 Flix Exporter for QuickTime. Media from other sources and testing on other platforms would be greatly appreciated.

The SWF and FLV file formats are media containers invented by Macromedia (now Adobe) for playback in Flash. Each file format has its own purpose, but with regard to just video and audio they are quite similar under the hood:

Both are tag-oriented container formats. FLV supports three types of tags (video, audio and meta) while SWF supports over 60 (video, audio, bytecode, shapes, sprites, fonts, etc) and is still growing. FLV represents a static media stream, like many of the QuickTime and Windows Media audio/video formats. SWF, on the other hand, is animated vector art with code, like SVG. Macromedia has been quite good about documenting the file formats, although they typically delay release of documentation for a generation (i.e. the Flash 6 documentation was published when Flash 7 came out) and require a license agreement.

The two tags types that SWF and FLV have in common (video and audio) share the same binary format for the media encoding, like MP3, H.263, VP6, ADCRM, ScreenVideo, NellyMoser, etc. Consequently it is feasible to convert audio and video data between the two file formats without requiring time-consuming transcoding or patented media encoding algorithms.

There are several complications involved in the conversion, however, that I had to overcome. For example SWF insists that there may be at most one audio packet per video frame and infers time from the video framerate. FLV on the other hand uses absolute time codes for every packet, whether video or audio.

While I was at it, I threw in an FLV to MP3 converter too since that’s just a subset of the FLV to SWF converter, and was handy for debugging.

I certainly have to give credit to the Ruby FLVTool2 library for helping me get v0.01 of my library (the initial FLV parsing part) out the door. That library helped me discover one place where Macromedia’s documentation was erroneous and helped me understand the undocumented FLV meta tag. Additionally, I must credit the SWF::File Perl library, which handles the SWF read/write portions of my transcoder. Yay open source!

Improving online presentations:Larry Wall at OSDC

The short version

I’ve posted a iPod-compatible video representation (68 MB, mirror, original; or original flickery QuickTime 55 MB, mirror, original) of Larry Wall’s Perl6 presentation at the OSDC::Israel::2006 conference in February. Enjoy.

The long version

Yesterday, an announcement appeared on usePerl.org offering an electronic representation of Larry Wall’s Perl6 presentation at the OSDC::Israel::2006 conference. That representation consisted of three parts: an MP3 recording of Larry speaking, Larry’s original PowerPoint slides, and a text transcript of the MP3.

One of the first comments to appear said that it was difficult to follow along with the slides since the MP3 had little indication of when Larry switched slides. After listening to the audio for a little while, I wholeheartedly agreed and realized that I could help make the experience better with some hard work and some of the tools I’ve built for my company’s MediaLandscape project. What follows are the steps I took to create the movie linked above.

Creating the source presentation

The MediaLandscape tools work with a collection of media files and metadata to describe them to construct a rich presentation. Usually the media files consist of a single video or audio file plus a collection of slides as either image files, a slide show (PPT, Keynote, etc) or as a screen capture. The critical metadata is the date and title of the presentation, the name and optional photo of the speaker (or speakers), the duration of the primary media, and the times at which the slides were displayed. That metadata should be collected in one of several supported XML formats.

Of that data, the only thing I was missing was the slide timing. I figured I could reconstruct that one lacking type of data. With that information, I would have a complete source presentation that I could then convert into one of a wide variety of formats.

My first step was to divide the PowerPoint into PNG files. Fortunately, PowerPoint supports a simple “Save As…” option to export the slides as a directory full of bitmap images. I chose the default of 72 dpi for rasterizing the slides.

Saving PowerPoint slides as PNG files

Next, I moved all of the files into a single folder. I added a mug shot of Larry from the OSDC::IL site and started creating the presentation.xml file (presoxml[view] the final version). I got the presentation date and time from the OSDC schedule, and googled for the time zone information (I hope I got that right). I added Larry’s info and then attacked the slides.

All of the slide information except the timing would be straightforward, so I cobbled together a short Perl program, shown below, to type my XML for me. I put in placeholder times in millisec for each slide, just so they’d be in the right order.

#!/usr/bin/perl -w
for my $i (1..164) {
   print <<"EOF";
    <Slide>
      <SlideNumber>$i</SlideNumber>
      <PresentationTime>${i}000</PresentationTime>
      <SlideFull>
        <FileName>Slide$i.png</FileName>
        <Dimensions>
          <Width>720</Width>
          <Height>540</Height>
        </Dimensions>
      </SlideFull>
    </Slide>
EOF
}

Finding the slide times

This next step was the most tedious. I opened all of the slides in Preview.app via open *.png, loaded the presentation.xml file in Emacs and started playing the MP3. Whenever I thought the slide had changed, I paused the MP3 and typed the time into the XML file. About 25% of the time, I could hear when Larry clicked his keyboard to advance the slide. Another 50% of the time, I could tell by Larry’s pauses when he switched slides. The last 25% came from the context of the talk and required me to occasionally rewind the audio a bit to figure out when the slide transition actually happened. In retrospect, the most tiresome part was doing mental math to transform the hh:mm:ss in iTunes to milliseconds for the XML. I probably should have transcribed the hh:mm:ss directly and used code to do the math.

This process took a bit over two hours to index the one hour, twelve minute talk. But then I was done! The rest was just running programs.

Creating the output

I then ran the MediaLandscape Publish program on the source presentation, targeting a video output. At the heart of the process is a command line utility I wrote called qtcomposite which takes the XML slide manifest and the media files and outputs a QuickTime reference movie with the slides placed on the timeline of the movie at the appropriate points. The program takes only 1-2 seconds to run, and the output is playable directly in QuickTime (less a few stutters at slide transitions when the PNGs are loading). This let me check and tweak some of the slide times.

Finally, I used QuickTime Pro and the built-in iPod (H.264) codec to export the reference movie as a monolithic, compressed video file. That export took about 10 minutes. I previously had tried using the DivX codec, but that codec mangled the slide timing when erroneously trying to normalize the framerate.

Screenshot of the iPod export dialog in QuickTime Pro

Lastly, I uploaded the video to my webserver and wrote this summary.

Conclusion

Definitely, the most time-consuming step was creating the slide timing. I would have spent over an hour listening to the presentation anyway, since I like hearing Larry speak, so it wasn’t wasted time.

It would have been much more pleasant and accurate if I had computer-recorded timing of the slide transitions from Larry’s original talk. There are several presentation capture solutions that can do just that, and I plan to talk about them in a future post.

Also, this output did not incorporate the transcript. Our engine supports subtitle tracks (optionally as XML or burned into the video) but I was a little too burned out from doing the slide timing manually to also partition the transcript into time-tagged subtitles. Any volunteers? 🙂 We can accept any of the subtitle formats that the Subtitles.pm module supports.

Editorial Notes

After publication, I edited this post a couple of times because I belatedly discovered that my DivX version of the video had the slide timings all wrong in the second half of the presentation. The iPod export codec got everything right. Apologies to readers who don’t have easy access to an H.264 decoder…

Private Regression Tests

This article discusses two classes of regression tests: public ones that are included with published software and private ones that are for the author’s use only. The public ones naturally get more exposure because they impact a wider audience. However, personal-use tests are quite valuable too. In the text below, I explore the variety of private tests that are available to you, the programmer, to help you produce more usable and reliable code. The context and code is specific to Perl, but the concepts are applicable to any programming language.

About regression tests

One of the greatest assets of the Perl community is the strong tradition of providing regression tests for the commonly used software libraries that are available via CPAN (the largest archive of Perl software). A regression test is a brief snippet of working code that uses the software in question and compares the actual results to expected results to see if everything works according to plans. The name “regression test” comes from their ability to easily detect new bugs introduced that cause software quality to regress. A set of good tests that pass can give you increased confidence in the quality of the software you’ve written.

Most software libraries that you can download from CPAN include these tests in a t/ subdirectory or in a file called test.pl. The programmers who write CPAN libraries typically run these tests to ensure they all pass before uploading the software. They include the same tests in the package you download so that you can run the tests too. Running the tests on your machine may reveal bugs that the programmer didn’t find; maybe you have a different OS or a different Perl version from what the author has.

Public tests

The tests that ship with software are intended to run anywhere. The user of the software may have a rather different operating environment from you, the author. So, you try to write tests for the lowest common denominator. You stick to writing tests that don’t take a long time, don’t access the network more than absolutely necessary, and don’t ask the user for input. Furthermore, you write tests which don’t assume much about what software the user has previously installed. If your software hasn’t insisted that a spellchecker be installed on the user’s computer, for example, your regression tests had best not try to do any spell checking.

This is a good thing, of course. You want your tests to be easy and automatable so that they are run in a wide variety of environments. That way, you can get help from your user community to find bugs that didn’t occur on your own computer.

Private tests

Besides the public tests that your users are expected to run, you can also employ a suite of private tests in your work. These private tests are not included in your final distribution (that is, not listed in the MANIFEST file). Among these are tests that:

  1. require special additional software that’s difficult or expensive to acquire,
  2. require special configuration to run properly,
  3. don’t affect the quality of the final software, or
  4. take too long to run.

Below are several specific tests that you can include in your development. Each exhibits one or more of the above limitations, so you wouldn’t want to include them in a CPAN distribution. In each case, I introduce a short test script that you can include in your t/ subdirectory. Personally, I name these files with a 00_local_ prefix so they always run first and so that I can easily exclude them from my distribution by adding a \d+_local_ line to my MANIFEST.SKIP file.

Example 1: Spell checking your documentation

It’s a good to have readable documentation and correct spelling helps with that readability. Running a traditional spell checker on Perl code is tedious, since the checker will flag most every variable and function name in your code. Even if you run it just on the documentation, your example code and subroutine lists will trigger the spell checker.

Test::Spelling is a CPAN module that knows how to parse your files looking only for POD and then excluding everything that looks like code examples (namely, C<> blocks, indented code, and =item headers). It makes good use of Pod::Spell to accomplish this, and then sends all of the remaining prose out to an external spell checker (like the open source aspell or ispell programs). Any misspellings are reported as test failures.

However, a typo in the documentation is not as serious as a typo in the code. For this reason, you should not include a Test::Spelling-based test in your distribution. You wouldn’t want a user to decide not to use your code because a spelling error caused the regression tests to fail.

Test::Spelling permits you to indicate any uncommon words in your documentation that would normally trip up the spell checker. You can list those words either in the test script or in the module file itself.

Here is the listing my 00_local_spelling.t:

use warnings;
use strict;
use Test::More;
use Test::Spelling;
set_spell_cmd('aspell -l');
add_stopwords(<DATA>);
all_pod_files_spelling_ok();
__DATA__
CGI
CPAN
GPL
Dolan
STDIN
STDOUT

In this test script, I first load the requisite libraries. Note that there is no eval to see if Test::Spelling is loaded. Public tests often use eval statements to be friendly and forgiving to users, but private tests should be harsh and strict instead of forgiving.

After the preliminaries, I use set_spell_cmd() to indicate which external spell checker it should use. I chose aspell (with the needed -l flag that makes it emit misspellings back to STDOUT) because it has a good reputation and it was easy to install on my Mac via fink install aspell-en.

Next, I use add_stopwords() to tell the script to ignore a list of words that I commonly use that aren’t in the dictionary (“stopwords” is spell checking jargon, apparently — it was news to me). These words listed one per line in the __DATA__ section of the program. I can easily retrieve this list as an array with the <DATA> filehandle iterator. If I come across more words that my spellchecker dislikes, I can either add them here or add them to the module file that is being tested (see below).

Finally, my test script calls all_pod_files_spelling_ok(). This function searches through the blib/ subdirectory looking for any file that contains POD. These can be .pm files in the blib/lib/ subdirectory or command-line programs in the blib/script subdirectory. For each file that is found Test::Spelling extracts the documentation, removes all Perl, and sends the words to aspell. Subsequently aspell reports back any mispelling, or nothing if there are none! If Test::Spelling receives any errors, it echos them as test failures like below. [This is a real example from the Perl::Critic project that I’m working on. This is the first time I ran a spell checker on it.]

% perl Build test test_files=t/00_local_spelling.t verbose=1
...
#   Failed test 'POD spelling for blib/lib/Perl/Critic/Config.pm'
not ok 3 - POD spelling for blib/lib/Perl/Critic/Config.pm
#   in /Users/chris/perl/lib/perl5/site_perl/Test/Spelling.pm at line 72.
# Errors:
#     PBP
#     PERLCRITIC
#     Thalhammer
#     inlucde
#     ommit
...
Failed 1/1 test scripts, 0.00% okay. 55/56 subtests failed, 1.79% okay.

As you can see, some of these are obvious typos (like “ommit”) while others are names (like “Thalhammer”, the last name of Jeffrey Ryan Thalhammer, the primary Perl::Critic author).

The typos should be fixed, naturally, and the others should be flagged as harmless. I will add “Thalhammer” to my global list of words, since I’m working with Jeff a lot these days. On the other hand, “PERLCRITIC” is specific to just this project, so I will add that word to the lib/Perl/Critic/Config.pm file itself.

First the global change. I simply append a line to the test script listed above:

...
__DATA__
CGI
CPAN
GPL
Dolan
STDIN
STDOUT
Thalhammer

Next, I edit lib/Perl/Critic/Config.pm. I fix the “inlucde” and “ommit” typos easily. Then I add some stopwords as follows. First, I find the first instance of any POD by searching for =head. The stopwords only apply to POD that comes after them, so I must be sure to add them at the beginning. Here’s the relevant section of Config.pm before my edits:

...
1;
__END__

=pod

=head1 NAME

Perl::Critic::Config - Load Perl::Critic user-preferences

=head1 DESCRIPTION
...

and after:

...
1;
__END__

=pod

=for stopwords PBP PERLCRITIC

=head1 NAME

Perl::Critic::Config - Load Perl::Critic user-preferences

=head1 DESCRIPTION
...

The only change is the =for stopwords ... line. This must be all on one line (see Pod::Spell for more details) and should be a whitespace-separated list of words.

After making those changes and running the spell test again, I see the following happy output:

% perl Build test test_files=t/00_local_spelling.t verbose=1
...
ok 3 - POD spelling for blib/lib/Perl/Critic/Config.pm
...
Failed 1/1 test scripts, 0.00% okay. 33/56 subtests failed, 41.07% okay.

I found that the global fix for “Thalhammer” by itself fixed 21 of the failing tests. Yay! So, these are easy problems to solve.

To recap, why not include this test in the distribution? Most users do not have aspell installed, let alone Test::Spelling. And even more of them don’t care that “Thalhammer” isn’t in the dictionary (sorry, Jeff). On the other hand, this simple test should remain in your t/ arsenal to catch any future typos you may make.

Example 2: Huge PDF tests

I am the primary author of CAM::PDF, a low-level PDF editing toolkit. The PDF specification very rational and straightforward, but it’s huge and has many addenda. Writing tests to cover all of the nuances of Adobe’s intentions for the document format is prohibitive. To make my life easy, one of the tests I wrote is to read, manipulate and rewrite the specification document itself, which is a 14 MB, 1172-page PDF. I figured that if my code can work with that file, it can work with more than half of the PDFs in the world.

However, there are several problems with that test that make it prohibitive to be a public test. First, it’s expensive: it takes tens of minutes on my dual-G5 Mac. Second, it uses a copyrighted document that I am not permitted to redistribute. Third, even if I could distribute it, CPAN users would rightly berate me for adding a 14 MB file to the download just for the test.

So, I made this a private test. My t/pdf.t file computes its test plan dynamically. If the t/PDFReference15_v5.pdf file is present, over 4000 tests are added to the list: four for each page of the document. I accomplished this by declaring use Test::More without the common (tests => <number>) suffix. Instead, I later use the following code to declare my test plan:

...
my $tests = 2 + @testdocs * 33 + @testpages * 4 + @impages * 4;
plan tests => $tests;
...

In the distributed version of CAM::PDF, I only include three short PDFs for a total of five pages and 129 tests. I think that’s a much more reasonable burden for end-user testing, and it still covers a good fraction of the CAM::PDF code.

Example 3: Test::Distribution, Test::Pod and Test::Pod::Coverage

The Test::Distribution module from CPAN is handy: it contains a collection of sanity checks that ensure you haven’t made any blunders before releasing your package to the world:

  1. It checks that your MANIFEST is present and not broken
  2. It checks that your README is present
  3. It checks that your Changes or ChangeLog is present
  4. It checks that your Build.PL or Makefile.PL is present
  5. It checks that all of your .pm files will compile with perl -c
  6. It checks that you specified a $VERSION
  7. It checks that your POD is not broken
  8. It checks that your POD describes all of your functions

Test::Pod and Test::Pod::Coverage do the same as tests number 7 and 8 above, respectively.

Here is an example 00_local_distribution.t:

use Test::More;
eval { require Test::Distribution; };
plan skip_all => 'Optional Test::Distribution not installed' if ($@);
import Test::Distribution;

Most CPAN authors would agree that a distribution must pass these tests to be considered competent. However, once your distribution passes them and is uploaded, these tests are really a waste of time (except perhaps the perl -c test, which presumably implied by your other tests!). If the README is missing, should perl Build.PL && perl Build test && perl Build install really fail for end users? I say no because the README is non-critical. Therefore, tests like these should be private tests.

Some members of the perl-qa email list and the cpants.perl.org community believe that tests like Test::Pod and Test::Pod::Coverage should be public and included in the distribution. Those advocates say that rewarding authors with “Kwalitee” points when they include POD tests in their CPAN packages will encourage them to write better POD, which is better for everyone.

I strongly agree with that goal, but I believe that the means to achieve that goal is flawed. Presumably once the author has achieved 100% POD coverage, it will remain 100% for all users. Therefore, anyone beyond the author who runs Test::Pod is just wasting their CPU time.

However, an alternative effective method of encouraging authors to strive for 100% POD coverage has not yet been discovered, so I do agree that the Kwalitee reward is an acceptable temporary means. Hopefully we can find something better.

Example 4: Version numbers

Several Perl community members, notably including Damian Conway in his “Perl Best Practices” book, have advocated that all .pm files uploaded to CPAN should have a $VERSION number. Instead, the common practice is that at least one .pm file should have a $VERSION, but omitting $VERSION in any other subsidiary .pm files is OK. The advocates of the ubiquitous $VERSION say that all files should be versioned so downstream programmers can test that they have the exact revision of the code that that they need.

If you agree that $VERSION numbers should be everywhere and, further, that they should be the same in all .pm files, then it’s nice to have a test to ensure that uniformity. The following homebrew test checks all Perl files in the blib/ subdirectory for $VERSION numbers. It would be really nice if someone were to write a Test::* module to accomplish this (hint, hint).

Here is 00_local_versionsync.t. Please note that it’s detection of $VERSION works for my coding style, but is far from universal. If you get $VERSION from $REVISION$, for example, this is a bad solution for you. Nonetheless, it’s a good example of a private test.

use warnings;
use strict;
use File::Find;
use File::Slurp;
use Test::More qw(no_plan);

my $last_version = undef;
find({wanted => \&check_version, no_chdir => 1}, 'blib');
if (! defined $last_version) {
    fail('Failed to find any files with $VERSION');
}

sub check_version {
    # $_ is the full path to the file
    return if (! m{blib/script/}xms && ! m{\.pm \z}xms);

    my $content = read_file($_);

    # only look at perl scripts, not sh scripts
    return if (m{blib/script/}xms && $content !~ m/\A \#![^\r\n]+?perl/xms);

    my @version_lines = $content =~ m/ ( [^\n]* \$VERSION [^\n]* ) /gxms;
    if (@version_lines == 0) {
       fail($_);
    }
    for my $line (@version_lines) {
        if (!defined $last_version) {
            $last_version = shift @version_lines;
            pass($_);
        }
        else {
            is($line, $last_version, $_);
        }
    }
}

This test should be private because if it passes on the author’s computer, it should always pass, and the consequences of a failure are minor compared to an algorithmic error.

Example 5: Copyright statements

Licensing is a big deal. As much as people wish they didn’t have to worry about it, they do. Unfortunately, a small subset of CPAN authors have been careless about licensing their code and have omitted any declaration of license or copyright. The lack of a license statement often means that the code cannot be redistributed. So, forget using apt-get or rpm or fink or even use PAR qw(...) to get the software in those cases. Minimally, all CPAN modules should have a LICENSE file or a license statement included in README.

Since Perl files can be installed in widely varying places in end users’ computers, and because README files usually are not installed, I assert that license statements should be present in all .pm and .pl files that are included in a distribution. This way, if a user wants to check if he can give a .pm file to a friend or include it in a .par file, he can just check the .pm instead of having to go back to CPAN (or perhaps even BackPAN).

The following 00_local_copyright.t doesn’t do all that yet, unfortunately. It simply checks README and all files in the blib/ subdirectory for a recognizable “Copyright YYYY” or “Copyright YYYY-YYYY” statement and ensures that “YYYY” must be the current year. It further ensures that at least one copyright statement is found. A better solution would be to ensure that the machine-readable META.yml file has a license: ... field, but the copyright year is important too.

#!perl -w

use warnings;
use strict;
use File::Find;
use File::Slurp;
use Test::More qw(no_plan);

my $this_year = [localtime]->[5]+1900;
my $copyrights_found = 0;
find({wanted => \&check_file, no_chdir => 1}, 'blib');
for (grep {/^readme/i} read_dir('.')) {
    check_file();
}
ok($copyrights_found != 0, 'found a copyright statement');

sub check_file {
    # $_ is the path to a filename, relative to the root of the
    # distribution

    # Only test plain files
    return if (! -f $_);

    # Filter the list of filenames
    return if (! m,^(?: README.*         # docs
                     |  .*/scripts/[^/]+ # programs
                     |  .*/script/[^/]+  # programs
                     |  .*/bin/[^/]+     # programs
                     |  .*\.(?: pl       # program ext
                             |  pm       # module ext
                             |  html     # doc ext
                             |  3pm      # doc ext
                             |  3        # doc ext
                             |  1        # doc ext
                            )
                    )$,xms);

    my $content = read_file($_);
    my @copyright_years = $content =~ m/
                                       (?: copyright | \(c\) )
                                       \s+
                                       (?: \d{4} \- )?
                                       (\d{4})
                                       /gixms;
    if (0 < grep {$_ ne $this_year} @copyright_years) {
        fail("$_ copyrights: @copyright_years");
    }
    elsif (0 == @copyright_years) {
        pass("$_, no copyright found");
    }
    else {
        pass($_);
    }
    $copyrights_found += @copyright_years;
}

Example 6: Test::Perl::Critic

The final example I will present is also the most comprehensive. The Perl::Critic project, along with the [Test::Perl::Critic][] wrapper, is a framework for enforcing a Perl coding style. Inspired by Damian Conway’s “Perl Best Practices” book, this package will parse all of your Perl files in the blib/ subdirectory and evaluate them against a collection of “Policy” modules that you specify. For example, the TestingAndDebugging::RequirePackageStricture policy insists that all files must include use strict;. On the opposite extreme of acceptedness, Miscellanea::RequireRcsKeywords insists that you have $Revision:...$ somewhere in your documentation, as filled in by CVS, SVN or the like. The project has grown to include optional policies beyond what Conway has recommended.

This package is incredibly useful for enforcing style guidelines, whether they be personal or corporate. The policies, being stricter than Perl itself, have even helped me find some subtle bugs in my code (notably via the Subroutines::RequireFinalReturn policy).

Like many of the other tests mentioned in this article, this is clearly an author-time test. The style guidelines are usually not critical to functional code, but instead emphasize readability and maintainability.

Furthermore, the Perl::Critic results may not be repeatable from machine to machine. At three months old as of this writing, the project is still moving very quickly. New policies are being added frequently so code that passed, say, Perl::Critic v0.12 may likely fail under v0.13. Furthermore, the current code uses Module::Pluggable so third party policies may be added at runtime. This makes Perl::Critic a very flexible and useful tool for the author, but a highly unpredictable target for an arbitrary end user.

Here is my 00_local_perlcritic.t:

use warnings;
use strict;

our @pcargs;
BEGIN
{
   my $rc = 't/perlcriticrc';
   @pcargs = -f $rc ? (-profile => $rc) : ();
}
use Test::Perl::Critic (@pcargs);
all_critic_ok();

The only unusual feature of this file is the BEGIN block. At runtime, this test checks for the presence of a t/perlcriticrc configuration file and, if present, it tells Test::Perl::Critic to use that configuration. Lacking that file, Perl::Critic will use its default policy. While it is carefully selected, that default policy is unlikely to be perfect for most authors.

My habit is to enable most of the Perl::Critic policies and then selectively disable problematic ones. For example, my CAM::PDF module uses camelCaseSubroutineNames, which is forbidden by NamingConventions::ProhibitMixedCaseVars. It would be prohibitive to rename the subroutines since I have lots of deployed code that uses that module. So, in my t/perlcriticrc file, I disable that policy like so:

[-NamingConventions::ProhibitMixedCaseSubs]

Conclusion

I have presented several examples of Perl regression tests that are of primary use to the software author, not to the author’s end users. These private tests share a set of common principles that differ from the public tests that are to be distributed with the rest of the code:

  1. They can be strict and unforgiving because test failures have a small price for the author
  2. They can employ any supporting modules or programs that the author has at his disposal
  3. They should only test aspects of the software that are non-critical or will not vary from computer to computer (like code style or documentation)

My hope is that a dialogue begins in the Perl community to support and encourage the creation and use of private tests.

One possibility is a common framework for private tests that asserts which tests have succeeded in a YAML format. For example, if I have a private Pod::Coverage test which adds a line to a tests.yml file like pod_coverage: ok, then I could omit the pod-coverage.t file from my distribution and still get credit for having high “Kwalitee” in my work. Furthermore, this can extend to event more author-dependent tests like code coverage via Devel::Cover. Wouldn’t it be great if CPAN authors could assert devel_cover: 100% in a machine-readable way?

Perhaps there could even be a public author.t file that declares these successes in a human-readable way like so:

t/author.t..........1..16
ok 1 - Test::Pod
ok 2 - Test::Pod::Coverage
ok 3 - Devel::Cover 100%
ok 4 - Test::Spelling
ok 5 - Test::Perl::Critic, 57 policies in effect
ok 6 - Test::Distribution
ok 8 - Version numbers all match
ok 9 - Copyright date is 2005
ok 10 - Works on darwin
ok 11 - Works on linux
nok 12 - Works on win32 # TODO needs some Windows lovin'!
ok 13 - Works on Perl 5.005
ok 14 - Works on Perl 5.6.1
ok 15 - Works on Perl 5.8.7
ok 16 - Works under mod_perl 1.x

PAR: Packaging for Perl applications

What is the best way to distribute a GUI application to users?

The three main choices are via an installer, via a standalone executable or via source. These choices vary a lot across platforms. Windows prefers installers, especially .msi files. Macs are quite happy with .app files, which are usually shipped on disk images. Most Linux variants use installers (.deb and .rpm) but some prefer source (e.g. Gentoo).

What if that application is written in Perl?

Perl is not typically considered a GUI language, but it does have bindings for GUI toolkits including Tk, wxWindows, Qt and GTK. Perl can be useful in the GUI realm as a rapid-development foundation or simply to add a couple of dialogs to a mostly-background process. A great barrier for entry, however, is that most platforms do not bundle these GUI toolkits with Perl and some platforms do not bundle Perl at all. Perl itself is most often distributed via installers, but the add-on modules that usually accompany any sophisticated Perl project are typically delivered as source. This poses a problem for most Windows users and many Mac users for whom this is too low-level a task to be tolerated. Only in the sysadmin-rich world of Linux and other Unixes aresudo cpan install Foo commands routinely tolerated.

The PAR project attempts to to create a solution to bundling the myriad files that usually compose a Perl application into a manageable monolith. The initial effort was modelled closely on the JAR concept that has proven to be a success in the Java community. As such, PAR files are simply ZIP files with manifests. If you have PAR installed on your computer, you can write Perl code that looks like this:

#!perl -w
use PAR 'foo.par';
use Foo;
...

and if Foo.pm is enclosed inside the foo.par file, it will be compiled from that source. Even more interesting, you can say:

#!perl -w
use PAR 'http://www.example.com/foo.par';
use Foo;
...

which will cause the foo.par archive to be downloaded and cached locally.

You may have noticed the sticky phrase above “If you have PAR installed…” That is a catch-22 of sorts. PAR helps users to skip the software installation steps, but first they have to … wait for it … install software.

To get around this, PAR takes another page from the ZIP playbook: self-extracting executables. The PAR distibution comes with a program called pp that allows a developer to wrap the core of Perl and any additional project-specific Perl modules into a PAR file with a main.pl and a .exe header to bootstrap the whole thing. What this gets you (on Windows in this example) is something like a Perl.exe with all of its modules embedded inside.

Here’s a simple example. Consider your basic Hello World application

---- hello.pl ----
#!perl -w
use strict;
use Tk;
my $mw = MainWindow->new;
$mw->Label(-text => 'Hello, world!')->pack;
$mw->Button(-text => 'Quit', -command => sub { exit })->pack;
MainLoop;

On a Mac, you have to have Tk installed (perhaps via fink install tk-pm586 if you’re on Tiger) and X11 running (perhaps via open /Applications/Utilities/X11.app). When you do so and run perl hello.pl you get something like this:

helloworld.pl screenshot

Now, say you want to give this cool new application to other Mac users. Telling them to first install Fink, Tk and X11 just for “Hello, World!” is ludicrous. Instead, you can build an executable like so:

/sw/bin/pp -o hello hello.pl

That creates a 3 MB executable called hello that includes the entire Perl and Tk. Send it to a friend who has a Mac (and X11, since we used a version of Tk that isn’t Aqua-friendly) and they can run it. If I were to make a Windows version of this it would be even easier on end users — on Windows, Tk binds directly to the native GUI so even the X11 prerequisite is not required.

Another benefit is version independence. The executable above is built against Perl 5.8.6 on Mac OS X 10.4. It should also work well on 10.3 or 10.2, even though those OSes shipped with older versions of Perl, because every part of 5.8.6 that was needed for Hello World is included in the EXE.

If you download that executable, you can open it with any Zip tool. For example:

% zipinfo hello
Archive:  hello   3013468 bytes   689 files
drwxr-xr-x  2.0 unx        0 b- stor 23-Oct-05 14:21 lib/
drwxr-xr-x  2.0 unx        0 b- stor 23-Oct-05 14:21 script/
-rw-r--r--  2.0 unx    20016 b- defN 23-Oct-05 14:21 MANIFEST
-rw-r--r--  2.0 unx      210 b- defN 23-Oct-05 14:21 META.yml
-rw-r--r--  2.0 unx     4971 b- defN 23-Oct-05 14:21 lib/AutoLoader.pm
-rw-r--r--  2.0 unx     4145 b- defN 23-Oct-05 14:21 lib/Carp.pm
... [snipped 679 lines] ...
-rw-r--r--  2.0 unx    12966 b- defN 23-Oct-05 14:21 lib/warnings.pm
-rw-r--r--  2.0 unx      787 b- defN 23-Oct-05 14:21 lib/warnings/register.pm
-rw-r--r--  2.0 unx      186 t- defN 23-May-05 22:22 script/hello.pl
-rw-r--r--  2.0 unx      262 b- defN 23-Oct-05 14:21 script/main.pl
689 files, 2742583 bytes uncompressed, 1078413 bytes compressed:  60.7%

(Note: you may see that the file sizes don’t match. That’s because the EXE also contains the whole Perl interpreter outside of the ZIP portion. That adds an extra 200% to file size in this case.)

Is it fast? No, the file need to be unzipped prior to use (which happens automatically, of course). Is it compact? No, 3 MB for Hello World is almost silly. But is it convenient? Yes. And that is often the most important quality when shipping software to users.

An interesting consequence of this distribution model is that the executable contains all of the source code. For some companies this may represent a problem (with some possible solutions listed at par.perl.org) but it is also a benefit in that you may satisfy any GPL requirements without having to offer a separate source download.


An important note for Windows is that, thanks to ActiveState.com, you do not need C compiler to build Perl yourself. They provide an installable package which include Tk pre-built. See links on par.perl.org for pre-compiled installers for PAR.