Category Archives: Languages

Programming Languages

CAM::PDF v1.54 fixes appendPDF bug

I maintain the open source CAM::PDF Perl library in my free time. This library, originally authored by me at Clotho Advanced Media starting in 2002, is a high-performance low-level PDF editing tool. It doesn’t have support for sophisticated authoring tasks (see PDF::API2 for that!) but it is good for utility work like concatenating two PDFs together or deleting pages from a document or encrypting a PDF.

I just fixed a bug where appending a big PDF to a small one (“big” in terms of number of internal objects, which correlates with page count or byte count but is subtly different) often went wrong because I generated object ID numbers by simply incrementing a counter in the small PDF. Sometimes, that counter matched IDs in the bigger PDF, but those IDs are supposed to be unique per document, so things went badly. This bug is now fixed by simply taking the max ID of the two docs as the new counter value before incrementing to make a new ID number.

I never stumbled on the bug in my own work because I always appended (or prepended) the smaller doc to the larger one for performance reasons. CAM::PDF v1.54 is on it’s way to CPAN as I type this.

I’m grateful to Charlie Katz of the Harvard-Smithsonian Center for Astrophysics for providing me with a simple test case that exhibits the problem!

Static typing, no workarounds

A simultaneous benefit and curse of loosely-typed languages (Perl, Ruby, Javascript, etc) is that a programmer can do just about anything with a third-party library. The ability to “fix” into other people’s code running in the same process is sometimes called “monkeypatching”.

Stricter languages, like Java, make it dramatically harder to accomplish the same goals. This is intentional, because Java also allows sandboxing of untrusted code in the same code space (pretty much impossible in existing dynamic languages despite efforts).

But what do you do if there is a problem with the third-party code and you can’t change it? In Java, you may just be screwed. For example, the Apache River distributed computing SDK uses “throws java.rmi.RemoteException” as a hint about which server methods are allowed to be invoked from the client. But, oops, Android’s Dalvik VM omitted all of the java.rmi.* classes to save resources. So trying to load a class with that “throws” declaration causes a “NoClassDefFoundError”. Because there’s no way to inject a third-party class into the “java.*” packages space, and Android developers have not been willing to add this simple Exception to Dalvik, this effectively kills all reasonable hopes of using River on Android. The River folks even talked about extreme hacks of rewriting the classes on load, but that’s too impractical. The answer from Android fans in scenarios like this seems to be “Why do you want to use that library anyway?

A dynamic language would have just created a stub for that exception if it didn’t exist. Is that better or worse?

kqueues

FreeBSD (and thus Mac OS X) has a filesystem notification API called kqueues which allow you to autodetect changes in watched files/folders. Apple uses it behind the scenes for indexing (Spotlight). It’s really fast. If I modify a file in one window via “touch /tmp/foo.txt” then the watcher program prints a message in the other window before the touch command exits. I’m using the Obj-C UKKQueue wrapper to simplify access to the kqueue system call.

I’ve found a few cases where the watcher fails to trigger when I rapidly delete and recreate the file… I’m not sure if it’s a bug in the system or in my app (probably the latter, maybe I’m not processing the input fast enough).

[UPDATE: oh, I just learned that kqueue can only watch a file that exists. If I wanted to know when a file was created, I’d have to watch the parent directory and then start watching the file when/if it was created. So that explains my problem with recreates]

In OS X 10.5, Apple added a new FSEvents API which uses a daemon to do a system-wide kqueue for the whole filesystem and distributes the messages to listeners in user space. I expect this is probably to reduce the kernel load of a pile of apps each requesting notifications for the same files.

My current project is targetting 10.4, so I’m avoiding the 10.5-specific FSEvents API.

FLV to SWF and back

During the past couple of weeks, I’ve been working on a library that can convert SWF files to FLV files and vice versa. I’ve finally succeeded and release v0.10 tonight! If you are a media expert (or amateur!) I’d love some help testing this. Currently, I’ve only tested on Mac with FLV and SWF files generated by the On2 Flix Exporter for QuickTime. Media from other sources and testing on other platforms would be greatly appreciated.

The SWF and FLV file formats are media containers invented by Macromedia (now Adobe) for playback in Flash. Each file format has its own purpose, but with regard to just video and audio they are quite similar under the hood:

Both are tag-oriented container formats. FLV supports three types of tags (video, audio and meta) while SWF supports over 60 (video, audio, bytecode, shapes, sprites, fonts, etc) and is still growing. FLV represents a static media stream, like many of the QuickTime and Windows Media audio/video formats. SWF, on the other hand, is animated vector art with code, like SVG. Macromedia has been quite good about documenting the file formats, although they typically delay release of documentation for a generation (i.e. the Flash 6 documentation was published when Flash 7 came out) and require a license agreement.

The two tags types that SWF and FLV have in common (video and audio) share the same binary format for the media encoding, like MP3, H.263, VP6, ADCRM, ScreenVideo, NellyMoser, etc. Consequently it is feasible to convert audio and video data between the two file formats without requiring time-consuming transcoding or patented media encoding algorithms.

There are several complications involved in the conversion, however, that I had to overcome. For example SWF insists that there may be at most one audio packet per video frame and infers time from the video framerate. FLV on the other hand uses absolute time codes for every packet, whether video or audio.

While I was at it, I threw in an FLV to MP3 converter too since that’s just a subset of the FLV to SWF converter, and was handy for debugging.

I certainly have to give credit to the Ruby FLVTool2 library for helping me get v0.01 of my library (the initial FLV parsing part) out the door. That library helped me discover one place where Macromedia’s documentation was erroneous and helped me understand the undocumented FLV meta tag. Additionally, I must credit the SWF::File Perl library, which handles the SWF read/write portions of my transcoder. Yay open source!

Improving online presentations:Larry Wall at OSDC

The short version

I’ve posted a iPod-compatible video representation (68 MB, mirror, original; or original flickery QuickTime 55 MB, mirror, original) of Larry Wall’s Perl6 presentation at the OSDC::Israel::2006 conference in February. Enjoy.

The long version

Yesterday, an announcement appeared on usePerl.org offering an electronic representation of Larry Wall’s Perl6 presentation at the OSDC::Israel::2006 conference. That representation consisted of three parts: an MP3 recording of Larry speaking, Larry’s original PowerPoint slides, and a text transcript of the MP3.

One of the first comments to appear said that it was difficult to follow along with the slides since the MP3 had little indication of when Larry switched slides. After listening to the audio for a little while, I wholeheartedly agreed and realized that I could help make the experience better with some hard work and some of the tools I’ve built for my company’s MediaLandscape project. What follows are the steps I took to create the movie linked above.

Creating the source presentation

The MediaLandscape tools work with a collection of media files and metadata to describe them to construct a rich presentation. Usually the media files consist of a single video or audio file plus a collection of slides as either image files, a slide show (PPT, Keynote, etc) or as a screen capture. The critical metadata is the date and title of the presentation, the name and optional photo of the speaker (or speakers), the duration of the primary media, and the times at which the slides were displayed. That metadata should be collected in one of several supported XML formats.

Of that data, the only thing I was missing was the slide timing. I figured I could reconstruct that one lacking type of data. With that information, I would have a complete source presentation that I could then convert into one of a wide variety of formats.

My first step was to divide the PowerPoint into PNG files. Fortunately, PowerPoint supports a simple “Save As…” option to export the slides as a directory full of bitmap images. I chose the default of 72 dpi for rasterizing the slides.

Saving PowerPoint slides as PNG files

Next, I moved all of the files into a single folder. I added a mug shot of Larry from the OSDC::IL site and started creating the presentation.xml file (presoxml[view] the final version). I got the presentation date and time from the OSDC schedule, and googled for the time zone information (I hope I got that right). I added Larry’s info and then attacked the slides.

All of the slide information except the timing would be straightforward, so I cobbled together a short Perl program, shown below, to type my XML for me. I put in placeholder times in millisec for each slide, just so they’d be in the right order.

#!/usr/bin/perl -w
for my $i (1..164) {
   print <<"EOF";
    <Slide>
      <SlideNumber>$i</SlideNumber>
      <PresentationTime>${i}000</PresentationTime>
      <SlideFull>
        <FileName>Slide$i.png</FileName>
        <Dimensions>
          <Width>720</Width>
          <Height>540</Height>
        </Dimensions>
      </SlideFull>
    </Slide>
EOF
}

Finding the slide times

This next step was the most tedious. I opened all of the slides in Preview.app via open *.png, loaded the presentation.xml file in Emacs and started playing the MP3. Whenever I thought the slide had changed, I paused the MP3 and typed the time into the XML file. About 25% of the time, I could hear when Larry clicked his keyboard to advance the slide. Another 50% of the time, I could tell by Larry’s pauses when he switched slides. The last 25% came from the context of the talk and required me to occasionally rewind the audio a bit to figure out when the slide transition actually happened. In retrospect, the most tiresome part was doing mental math to transform the hh:mm:ss in iTunes to milliseconds for the XML. I probably should have transcribed the hh:mm:ss directly and used code to do the math.

This process took a bit over two hours to index the one hour, twelve minute talk. But then I was done! The rest was just running programs.

Creating the output

I then ran the MediaLandscape Publish program on the source presentation, targeting a video output. At the heart of the process is a command line utility I wrote called qtcomposite which takes the XML slide manifest and the media files and outputs a QuickTime reference movie with the slides placed on the timeline of the movie at the appropriate points. The program takes only 1-2 seconds to run, and the output is playable directly in QuickTime (less a few stutters at slide transitions when the PNGs are loading). This let me check and tweak some of the slide times.

Finally, I used QuickTime Pro and the built-in iPod (H.264) codec to export the reference movie as a monolithic, compressed video file. That export took about 10 minutes. I previously had tried using the DivX codec, but that codec mangled the slide timing when erroneously trying to normalize the framerate.

Screenshot of the iPod export dialog in QuickTime Pro

Lastly, I uploaded the video to my webserver and wrote this summary.

Conclusion

Definitely, the most time-consuming step was creating the slide timing. I would have spent over an hour listening to the presentation anyway, since I like hearing Larry speak, so it wasn’t wasted time.

It would have been much more pleasant and accurate if I had computer-recorded timing of the slide transitions from Larry’s original talk. There are several presentation capture solutions that can do just that, and I plan to talk about them in a future post.

Also, this output did not incorporate the transcript. Our engine supports subtitle tracks (optionally as XML or burned into the video) but I was a little too burned out from doing the slide timing manually to also partition the transcript into time-tagged subtitles. Any volunteers? 🙂 We can accept any of the subtitle formats that the Subtitles.pm module supports.

Editorial Notes

After publication, I edited this post a couple of times because I belatedly discovered that my DivX version of the video had the slide timings all wrong in the second half of the presentation. The iPod export codec got everything right. Apologies to readers who don’t have easy access to an H.264 decoder…