Private Regression Tests

This article discusses two classes of regression tests: public ones that are included with published software and private ones that are for the author’s use only. The public ones naturally get more exposure because they impact a wider audience. However, personal-use tests are quite valuable too. In the text below, I explore the variety of private tests that are available to you, the programmer, to help you produce more usable and reliable code. The context and code is specific to Perl, but the concepts are applicable to any programming language.

About regression tests

One of the greatest assets of the Perl community is the strong tradition of providing regression tests for the commonly used software libraries that are available via CPAN (the largest archive of Perl software). A regression test is a brief snippet of working code that uses the software in question and compares the actual results to expected results to see if everything works according to plans. The name “regression test” comes from their ability to easily detect new bugs introduced that cause software quality to regress. A set of good tests that pass can give you increased confidence in the quality of the software you’ve written.

Most software libraries that you can download from CPAN include these tests in a t/ subdirectory or in a file called test.pl. The programmers who write CPAN libraries typically run these tests to ensure they all pass before uploading the software. They include the same tests in the package you download so that you can run the tests too. Running the tests on your machine may reveal bugs that the programmer didn’t find; maybe you have a different OS or a different Perl version from what the author has.

Public tests

The tests that ship with software are intended to run anywhere. The user of the software may have a rather different operating environment from you, the author. So, you try to write tests for the lowest common denominator. You stick to writing tests that don’t take a long time, don’t access the network more than absolutely necessary, and don’t ask the user for input. Furthermore, you write tests which don’t assume much about what software the user has previously installed. If your software hasn’t insisted that a spellchecker be installed on the user’s computer, for example, your regression tests had best not try to do any spell checking.

This is a good thing, of course. You want your tests to be easy and automatable so that they are run in a wide variety of environments. That way, you can get help from your user community to find bugs that didn’t occur on your own computer.

Private tests

Besides the public tests that your users are expected to run, you can also employ a suite of private tests in your work. These private tests are not included in your final distribution (that is, not listed in the MANIFEST file). Among these are tests that:

  1. require special additional software that’s difficult or expensive to acquire,
  2. require special configuration to run properly,
  3. don’t affect the quality of the final software, or
  4. take too long to run.

Below are several specific tests that you can include in your development. Each exhibits one or more of the above limitations, so you wouldn’t want to include them in a CPAN distribution. In each case, I introduce a short test script that you can include in your t/ subdirectory. Personally, I name these files with a 00_local_ prefix so they always run first and so that I can easily exclude them from my distribution by adding a \d+_local_ line to my MANIFEST.SKIP file.

Example 1: Spell checking your documentation

It’s a good to have readable documentation and correct spelling helps with that readability. Running a traditional spell checker on Perl code is tedious, since the checker will flag most every variable and function name in your code. Even if you run it just on the documentation, your example code and subroutine lists will trigger the spell checker.

Test::Spelling is a CPAN module that knows how to parse your files looking only for POD and then excluding everything that looks like code examples (namely, C<> blocks, indented code, and =item headers). It makes good use of Pod::Spell to accomplish this, and then sends all of the remaining prose out to an external spell checker (like the open source aspell or ispell programs). Any misspellings are reported as test failures.

However, a typo in the documentation is not as serious as a typo in the code. For this reason, you should not include a Test::Spelling-based test in your distribution. You wouldn’t want a user to decide not to use your code because a spelling error caused the regression tests to fail.

Test::Spelling permits you to indicate any uncommon words in your documentation that would normally trip up the spell checker. You can list those words either in the test script or in the module file itself.

Here is the listing my 00_local_spelling.t:

use warnings;
use strict;
use Test::More;
use Test::Spelling;
set_spell_cmd('aspell -l');
add_stopwords(<DATA>);
all_pod_files_spelling_ok();
__DATA__
CGI
CPAN
GPL
Dolan
STDIN
STDOUT

In this test script, I first load the requisite libraries. Note that there is no eval to see if Test::Spelling is loaded. Public tests often use eval statements to be friendly and forgiving to users, but private tests should be harsh and strict instead of forgiving.

After the preliminaries, I use set_spell_cmd() to indicate which external spell checker it should use. I chose aspell (with the needed -l flag that makes it emit misspellings back to STDOUT) because it has a good reputation and it was easy to install on my Mac via fink install aspell-en.

Next, I use add_stopwords() to tell the script to ignore a list of words that I commonly use that aren’t in the dictionary (“stopwords” is spell checking jargon, apparently — it was news to me). These words listed one per line in the __DATA__ section of the program. I can easily retrieve this list as an array with the <DATA> filehandle iterator. If I come across more words that my spellchecker dislikes, I can either add them here or add them to the module file that is being tested (see below).

Finally, my test script calls all_pod_files_spelling_ok(). This function searches through the blib/ subdirectory looking for any file that contains POD. These can be .pm files in the blib/lib/ subdirectory or command-line programs in the blib/script subdirectory. For each file that is found Test::Spelling extracts the documentation, removes all Perl, and sends the words to aspell. Subsequently aspell reports back any mispelling, or nothing if there are none! If Test::Spelling receives any errors, it echos them as test failures like below. [This is a real example from the Perl::Critic project that I’m working on. This is the first time I ran a spell checker on it.]

% perl Build test test_files=t/00_local_spelling.t verbose=1
...
#   Failed test 'POD spelling for blib/lib/Perl/Critic/Config.pm'
not ok 3 - POD spelling for blib/lib/Perl/Critic/Config.pm
#   in /Users/chris/perl/lib/perl5/site_perl/Test/Spelling.pm at line 72.
# Errors:
#     PBP
#     PERLCRITIC
#     Thalhammer
#     inlucde
#     ommit
...
Failed 1/1 test scripts, 0.00% okay. 55/56 subtests failed, 1.79% okay.

As you can see, some of these are obvious typos (like “ommit”) while others are names (like “Thalhammer”, the last name of Jeffrey Ryan Thalhammer, the primary Perl::Critic author).

The typos should be fixed, naturally, and the others should be flagged as harmless. I will add “Thalhammer” to my global list of words, since I’m working with Jeff a lot these days. On the other hand, “PERLCRITIC” is specific to just this project, so I will add that word to the lib/Perl/Critic/Config.pm file itself.

First the global change. I simply append a line to the test script listed above:

...
__DATA__
CGI
CPAN
GPL
Dolan
STDIN
STDOUT
Thalhammer

Next, I edit lib/Perl/Critic/Config.pm. I fix the “inlucde” and “ommit” typos easily. Then I add some stopwords as follows. First, I find the first instance of any POD by searching for =head. The stopwords only apply to POD that comes after them, so I must be sure to add them at the beginning. Here’s the relevant section of Config.pm before my edits:

...
1;
__END__

=pod

=head1 NAME

Perl::Critic::Config - Load Perl::Critic user-preferences

=head1 DESCRIPTION
...

and after:

...
1;
__END__

=pod

=for stopwords PBP PERLCRITIC

=head1 NAME

Perl::Critic::Config - Load Perl::Critic user-preferences

=head1 DESCRIPTION
...

The only change is the =for stopwords ... line. This must be all on one line (see Pod::Spell for more details) and should be a whitespace-separated list of words.

After making those changes and running the spell test again, I see the following happy output:

% perl Build test test_files=t/00_local_spelling.t verbose=1
...
ok 3 - POD spelling for blib/lib/Perl/Critic/Config.pm
...
Failed 1/1 test scripts, 0.00% okay. 33/56 subtests failed, 41.07% okay.

I found that the global fix for “Thalhammer” by itself fixed 21 of the failing tests. Yay! So, these are easy problems to solve.

To recap, why not include this test in the distribution? Most users do not have aspell installed, let alone Test::Spelling. And even more of them don’t care that “Thalhammer” isn’t in the dictionary (sorry, Jeff). On the other hand, this simple test should remain in your t/ arsenal to catch any future typos you may make.

Example 2: Huge PDF tests

I am the primary author of CAM::PDF, a low-level PDF editing toolkit. The PDF specification very rational and straightforward, but it’s huge and has many addenda. Writing tests to cover all of the nuances of Adobe’s intentions for the document format is prohibitive. To make my life easy, one of the tests I wrote is to read, manipulate and rewrite the specification document itself, which is a 14 MB, 1172-page PDF. I figured that if my code can work with that file, it can work with more than half of the PDFs in the world.

However, there are several problems with that test that make it prohibitive to be a public test. First, it’s expensive: it takes tens of minutes on my dual-G5 Mac. Second, it uses a copyrighted document that I am not permitted to redistribute. Third, even if I could distribute it, CPAN users would rightly berate me for adding a 14 MB file to the download just for the test.

So, I made this a private test. My t/pdf.t file computes its test plan dynamically. If the t/PDFReference15_v5.pdf file is present, over 4000 tests are added to the list: four for each page of the document. I accomplished this by declaring use Test::More without the common (tests => <number>) suffix. Instead, I later use the following code to declare my test plan:

...
my $tests = 2 + @testdocs * 33 + @testpages * 4 + @impages * 4;
plan tests => $tests;
...

In the distributed version of CAM::PDF, I only include three short PDFs for a total of five pages and 129 tests. I think that’s a much more reasonable burden for end-user testing, and it still covers a good fraction of the CAM::PDF code.

Example 3: Test::Distribution, Test::Pod and Test::Pod::Coverage

The Test::Distribution module from CPAN is handy: it contains a collection of sanity checks that ensure you haven’t made any blunders before releasing your package to the world:

  1. It checks that your MANIFEST is present and not broken
  2. It checks that your README is present
  3. It checks that your Changes or ChangeLog is present
  4. It checks that your Build.PL or Makefile.PL is present
  5. It checks that all of your .pm files will compile with perl -c
  6. It checks that you specified a $VERSION
  7. It checks that your POD is not broken
  8. It checks that your POD describes all of your functions

Test::Pod and Test::Pod::Coverage do the same as tests number 7 and 8 above, respectively.

Here is an example 00_local_distribution.t:

use Test::More;
eval { require Test::Distribution; };
plan skip_all => 'Optional Test::Distribution not installed' if ($@);
import Test::Distribution;

Most CPAN authors would agree that a distribution must pass these tests to be considered competent. However, once your distribution passes them and is uploaded, these tests are really a waste of time (except perhaps the perl -c test, which presumably implied by your other tests!). If the README is missing, should perl Build.PL && perl Build test && perl Build install really fail for end users? I say no because the README is non-critical. Therefore, tests like these should be private tests.

Some members of the perl-qa email list and the cpants.perl.org community believe that tests like Test::Pod and Test::Pod::Coverage should be public and included in the distribution. Those advocates say that rewarding authors with “Kwalitee” points when they include POD tests in their CPAN packages will encourage them to write better POD, which is better for everyone.

I strongly agree with that goal, but I believe that the means to achieve that goal is flawed. Presumably once the author has achieved 100% POD coverage, it will remain 100% for all users. Therefore, anyone beyond the author who runs Test::Pod is just wasting their CPU time.

However, an alternative effective method of encouraging authors to strive for 100% POD coverage has not yet been discovered, so I do agree that the Kwalitee reward is an acceptable temporary means. Hopefully we can find something better.

Example 4: Version numbers

Several Perl community members, notably including Damian Conway in his “Perl Best Practices” book, have advocated that all .pm files uploaded to CPAN should have a $VERSION number. Instead, the common practice is that at least one .pm file should have a $VERSION, but omitting $VERSION in any other subsidiary .pm files is OK. The advocates of the ubiquitous $VERSION say that all files should be versioned so downstream programmers can test that they have the exact revision of the code that that they need.

If you agree that $VERSION numbers should be everywhere and, further, that they should be the same in all .pm files, then it’s nice to have a test to ensure that uniformity. The following homebrew test checks all Perl files in the blib/ subdirectory for $VERSION numbers. It would be really nice if someone were to write a Test::* module to accomplish this (hint, hint).

Here is 00_local_versionsync.t. Please note that it’s detection of $VERSION works for my coding style, but is far from universal. If you get $VERSION from $REVISION$, for example, this is a bad solution for you. Nonetheless, it’s a good example of a private test.

use warnings;
use strict;
use File::Find;
use File::Slurp;
use Test::More qw(no_plan);

my $last_version = undef;
find({wanted => \&check_version, no_chdir => 1}, 'blib');
if (! defined $last_version) {
    fail('Failed to find any files with $VERSION');
}

sub check_version {
    # $_ is the full path to the file
    return if (! m{blib/script/}xms && ! m{\.pm \z}xms);

    my $content = read_file($_);

    # only look at perl scripts, not sh scripts
    return if (m{blib/script/}xms && $content !~ m/\A \#![^\r\n]+?perl/xms);

    my @version_lines = $content =~ m/ ( [^\n]* \$VERSION [^\n]* ) /gxms;
    if (@version_lines == 0) {
       fail($_);
    }
    for my $line (@version_lines) {
        if (!defined $last_version) {
            $last_version = shift @version_lines;
            pass($_);
        }
        else {
            is($line, $last_version, $_);
        }
    }
}

This test should be private because if it passes on the author’s computer, it should always pass, and the consequences of a failure are minor compared to an algorithmic error.

Example 5: Copyright statements

Licensing is a big deal. As much as people wish they didn’t have to worry about it, they do. Unfortunately, a small subset of CPAN authors have been careless about licensing their code and have omitted any declaration of license or copyright. The lack of a license statement often means that the code cannot be redistributed. So, forget using apt-get or rpm or fink or even use PAR qw(...) to get the software in those cases. Minimally, all CPAN modules should have a LICENSE file or a license statement included in README.

Since Perl files can be installed in widely varying places in end users’ computers, and because README files usually are not installed, I assert that license statements should be present in all .pm and .pl files that are included in a distribution. This way, if a user wants to check if he can give a .pm file to a friend or include it in a .par file, he can just check the .pm instead of having to go back to CPAN (or perhaps even BackPAN).

The following 00_local_copyright.t doesn’t do all that yet, unfortunately. It simply checks README and all files in the blib/ subdirectory for a recognizable “Copyright YYYY” or “Copyright YYYY-YYYY” statement and ensures that “YYYY” must be the current year. It further ensures that at least one copyright statement is found. A better solution would be to ensure that the machine-readable META.yml file has a license: ... field, but the copyright year is important too.

#!perl -w

use warnings;
use strict;
use File::Find;
use File::Slurp;
use Test::More qw(no_plan);

my $this_year = [localtime]->[5]+1900;
my $copyrights_found = 0;
find({wanted => \&check_file, no_chdir => 1}, 'blib');
for (grep {/^readme/i} read_dir('.')) {
    check_file();
}
ok($copyrights_found != 0, 'found a copyright statement');

sub check_file {
    # $_ is the path to a filename, relative to the root of the
    # distribution

    # Only test plain files
    return if (! -f $_);

    # Filter the list of filenames
    return if (! m,^(?: README.*         # docs
                     |  .*/scripts/[^/]+ # programs
                     |  .*/script/[^/]+  # programs
                     |  .*/bin/[^/]+     # programs
                     |  .*\.(?: pl       # program ext
                             |  pm       # module ext
                             |  html     # doc ext
                             |  3pm      # doc ext
                             |  3        # doc ext
                             |  1        # doc ext
                            )
                    )$,xms);

    my $content = read_file($_);
    my @copyright_years = $content =~ m/
                                       (?: copyright | \(c\) )
                                       \s+
                                       (?: \d{4} \- )?
                                       (\d{4})
                                       /gixms;
    if (0 < grep {$_ ne $this_year} @copyright_years) {
        fail("$_ copyrights: @copyright_years");
    }
    elsif (0 == @copyright_years) {
        pass("$_, no copyright found");
    }
    else {
        pass($_);
    }
    $copyrights_found += @copyright_years;
}

Example 6: Test::Perl::Critic

The final example I will present is also the most comprehensive. The Perl::Critic project, along with the [Test::Perl::Critic][] wrapper, is a framework for enforcing a Perl coding style. Inspired by Damian Conway’s “Perl Best Practices” book, this package will parse all of your Perl files in the blib/ subdirectory and evaluate them against a collection of “Policy” modules that you specify. For example, the TestingAndDebugging::RequirePackageStricture policy insists that all files must include use strict;. On the opposite extreme of acceptedness, Miscellanea::RequireRcsKeywords insists that you have $Revision:...$ somewhere in your documentation, as filled in by CVS, SVN or the like. The project has grown to include optional policies beyond what Conway has recommended.

This package is incredibly useful for enforcing style guidelines, whether they be personal or corporate. The policies, being stricter than Perl itself, have even helped me find some subtle bugs in my code (notably via the Subroutines::RequireFinalReturn policy).

Like many of the other tests mentioned in this article, this is clearly an author-time test. The style guidelines are usually not critical to functional code, but instead emphasize readability and maintainability.

Furthermore, the Perl::Critic results may not be repeatable from machine to machine. At three months old as of this writing, the project is still moving very quickly. New policies are being added frequently so code that passed, say, Perl::Critic v0.12 may likely fail under v0.13. Furthermore, the current code uses Module::Pluggable so third party policies may be added at runtime. This makes Perl::Critic a very flexible and useful tool for the author, but a highly unpredictable target for an arbitrary end user.

Here is my 00_local_perlcritic.t:

use warnings;
use strict;

our @pcargs;
BEGIN
{
   my $rc = 't/perlcriticrc';
   @pcargs = -f $rc ? (-profile => $rc) : ();
}
use Test::Perl::Critic (@pcargs);
all_critic_ok();

The only unusual feature of this file is the BEGIN block. At runtime, this test checks for the presence of a t/perlcriticrc configuration file and, if present, it tells Test::Perl::Critic to use that configuration. Lacking that file, Perl::Critic will use its default policy. While it is carefully selected, that default policy is unlikely to be perfect for most authors.

My habit is to enable most of the Perl::Critic policies and then selectively disable problematic ones. For example, my CAM::PDF module uses camelCaseSubroutineNames, which is forbidden by NamingConventions::ProhibitMixedCaseVars. It would be prohibitive to rename the subroutines since I have lots of deployed code that uses that module. So, in my t/perlcriticrc file, I disable that policy like so:

[-NamingConventions::ProhibitMixedCaseSubs]

Conclusion

I have presented several examples of Perl regression tests that are of primary use to the software author, not to the author’s end users. These private tests share a set of common principles that differ from the public tests that are to be distributed with the rest of the code:

  1. They can be strict and unforgiving because test failures have a small price for the author
  2. They can employ any supporting modules or programs that the author has at his disposal
  3. They should only test aspects of the software that are non-critical or will not vary from computer to computer (like code style or documentation)

My hope is that a dialogue begins in the Perl community to support and encourage the creation and use of private tests.

One possibility is a common framework for private tests that asserts which tests have succeeded in a YAML format. For example, if I have a private Pod::Coverage test which adds a line to a tests.yml file like pod_coverage: ok, then I could omit the pod-coverage.t file from my distribution and still get credit for having high “Kwalitee” in my work. Furthermore, this can extend to event more author-dependent tests like code coverage via Devel::Cover. Wouldn’t it be great if CPAN authors could assert devel_cover: 100% in a machine-readable way?

Perhaps there could even be a public author.t file that declares these successes in a human-readable way like so:

t/author.t..........1..16
ok 1 - Test::Pod
ok 2 - Test::Pod::Coverage
ok 3 - Devel::Cover 100%
ok 4 - Test::Spelling
ok 5 - Test::Perl::Critic, 57 policies in effect
ok 6 - Test::Distribution
ok 8 - Version numbers all match
ok 9 - Copyright date is 2005
ok 10 - Works on darwin
ok 11 - Works on linux
nok 12 - Works on win32 # TODO needs some Windows lovin'!
ok 13 - Works on Perl 5.005
ok 14 - Works on Perl 5.6.1
ok 15 - Works on Perl 5.8.7
ok 16 - Works under mod_perl 1.x

11 Responses to “Private Regression Tests”

  1. Chris says:

    There is another option: public but disabled. Catalyst inclused stress and memleak tests. By default they are skipped unless the user (or author) enables them in ENV: make test TEST_STRESS=1, etc.

    Personally, I think the author should ship all tests that the user could possibly be able to run. Let the user decide, and set some sane defaults.

    Now, as for Test::Spellling…I can’t believe I’ve missed that one. I’m a horrible speller/typist (other than code) and my dists need some serious spell checking. Your example will definately make my life easier in that respect.

  2. Chris Dolan says:

    Chris, re: public but disabled

    Your opinion regarding shipping the author tests, but disabled, is intriguing. One problem is that sometimes you have tests that are unlikely to work on machine other than the author’s machine. For example, if you have a personal dictionary for aspell, you could mistakenly omit a stopword and not notice.

    An option that I’ve been musing but did not mention in the article is a bunch ot t/*.ta files, where “ta” stands for “test, author”. Then add a Module::Build feaure like Build authortests or simply add t/*.ta to the list of tests run by Build disttest

  3. Chris says:

    This argument on perl-qa usual ends up with two camps: those for shipping all tests for the sake of Kwalitee, and those with author tests that would never work anywhere else. I tend to lean towards the former, but I do understand the latter.

    Sometimes, shipping the tests [disabled] is just a precautionary measure. If you are trying to debug a problem on someone elses machine and they have the CPAN dist, but no SVN access, then the only easy place for them to get the private tests is from the dist itself. Like all good arguments on this topic, every situation is different and can’t be shoehorns into either camp cleanly.

    In Handel, I ship a few tests I probably shouldn’t, like all of the TT and AxKit output tests. They’re not too time consuming, but most people never need to test them. I should probably disable them by default. Then again, when they do run somewhere else out in the world, it’s the only chance I get for other real world checking of the TT and AxKit installs aside from my own.

    Sometimes, the only chance authors get to test their code on other platforms and versions are the very tests that probably should be private.

  4. Chris says:

    Oh yeah. Nice article. :-)

  5. Hi Chris,

    First of all, thanks for publicizing Test::Spelling! ;-) (I am the author.)

    I agree that there are tests that should only be run by the author. However, as you mentioned in one of your comments, I think the ideal solution would be to have a standard way for authors to include those tests in the distribution in such a way that they are only run if a user really wants to (and definitely not run by cpantesters automatically!). This way, users who want to modify the code will have access to the same tests that the author uses, which will facilitate a more open development process.

    I’ve played with ENV variables to control optional tests in the past, but I think it would be much better if it were standardized as part of MakeMaker, Module::Build, etc.

  6. Peter Erwin says:

    Interesting article — even if I skipped lots of the code because I don’t use Perl ;-).

    In fact, I’m on the verge of releasing a small Python package to do searching of telescope archives, so your post is more relevant to me than it might otherwise be….

    Out of curiosity — what’s your feeling about unit testing? (Or is that an upcoming article? ;-)

  7. Chris Dolan says:

    Peter, re: unit testing

    The public tests I mention would be the unit tests. Definitely, unit tests are the most important — sometimes more important than the code itself. In the article I attempt to go a bit beyond unit tests for core algorithms and delve into unit tests for more human topics, like spelling.

  8. A very well written article. And the concept of private regression tests is so good that I’m surprised no one has thought of it before. (Well, at the very least, it’s news to me.)

    One specific comment: You write:

    “…members of the … cpants.perl.org community believe that tests like Test::Pod and Test::Pod::Coverage should be public and included in the distribution. Those advocates say that rewarding authors with “Kwalitee” points when they include POD tests in their CPAN packages will encourage them to write better POD, which is better for everyone.”

    I write good POD. When you print out the POD from some of my modules, it runs to 40+ pages. But I don’t always use the same “head1 some_sub()” structure picked up by Test::Pod::Coverage. Result: These modules get a big 0 on this aspect of “kwalitee.” But am I going to restructure that documentation just to satisfy the kwalitee fanatics? Hell, no! This is a case where the test should be private. Let the user read the documentation and judge it on its own merits!

    Jim Keenan

  9. Chris says:

    Crappity crap n crap. What version of aspell are you using? My appell -l yields “You must specify a parameter for “-l”. My version is 0.60 and the -l is for setting the language. :-(

    My aspell on win32 just hangs are the IPC::Open2::open call. Not getting far on spelling this evening. :-)

  10. Chris says:

    Sorry for the comment spam. At some version of aspell, it’s ‘aspell list’ instead of ‘aspell -l’

  11. Peter Erwin says:

    The public tests I mention would be the unit tests. Definitely, unit tests are the most important — sometimes more important than the code itself. In the article I attempt to go a bit beyond unit tests for core algorithms and delve into unit tests for more human topics, like spelling.

    OK, I see what you mean. My question sprang partly from a perception that some people use “regression tests” and “unit tests” to mean two different things (that is, two different types of tests). So I partly interpreted your use of the term “regression tests” as meaning (amont other things) “not unit tests.”