Advanced Installer

One of my work projects is a Windows-only media conversion application. As happens often in the software world, I had to hurry up and learn about installers. So, in late summer 2004 I googled and found Advanced Installer to be highly ranked. I tried the demo, as well as looking into a couple of other installer builders. I was fairly impressed, so we ended up buying the Pro version of Advanced Installer and we’ve been using it since then.

While to company is based in California, the programmers are all in Romania. Nonetheless, I’ve had excellent response to a couple of support requests I’ve submitted. In both cases, I reported bugs and they were fixed within 3 days and 1 day respectively. Sweet.

The project description file is all XML, so it’s possible to hand edit, but it’s kind of crufty, so I don’t recommend that. The most basic activities are supported via the command line (updating the version number and building an installer), and they promise that more command line actions will be available in the future.

Aggregation

Then

About ten years ago, Yahoo appeared on the net. It was so very cool because it was a central place where you could find anything you were looking for. By being popular itself, Yahoo decided what on the web was cool and what was not. Aside from the original NCSA list of all web pages (!), it was the portal site I ever saw. Since then, portals have risen and fallen (e.g. Lycos, Netscape, Altavista). Today, Google released a beta of its new portalized. It’s nice that have plans to let you uber-customize it, but right now it’s quite ho-hum.

The problem with portals is that they decide what’s important and what’s not. If your interests coincide with the portal authors, then you are well served. If you are not a perfect match, you will have to sift through much chaff to reach the kernel of information you seek. Some portals have tried to offer some degree of personalization. For example, Slashdot offers pre-defined portals for broad niches (e.g. apple.slashdot.org) as well as the ability to filter out unwanted topics and authors (e.g. JonKatz). That goes a long way towards satisfying the needs of the tech geeks, but it still relies on the much-maligned Slashdot editors to make the initial content selections from which readers may filter.

To get a full serving of content, the typical avid reader would hit his or her 10+ sites on a regular basis to keep up on the latest.

Now

Having to hit more than 10 sites just to see if they have anything new is tedious. Tabbed browsing and bookmarked tab groups does help cut down the overhead, but its still overhead. Why not put the computer to work to do some of this search work for us?

Enter aggregation.

Aggregation tools build your own personal portal for you by periodically querying the sources that interest you and pulling the prime content into a digestible form. The basic tools to permit this have been around for a while, but only in the last year has it really come together. The important bit was that the content producers out there had to standardize on a computer readable structure for their offerings. Thus, we now see RSS, RDF, Atom, and OPML links all over the place, collectively called feeds. These are XML-based file formats to collect abstracts of web-based content. The small files are fast to download and easy to process, so it’s simple to write software to read them. This simplicity created an explosion of parses, which then snowballed into a further explosion of feed offerings.

The next step is to accumulate all of these feeds (and perhaps sort and filter them along the way) into a body of information that the user can absorb. This aggregation process is the real energy saver. Popular tools include client-side solutions like NetNewsWire, Sage (a Firefox extension) and now Safari RSS. On the server side, aggregators that produce concise HTML include PlanetPlanet (my personal favorite), Bloglines and Radio Userland.

In my personal experience, my aggregation has led to me being able to absorb about four times as many news stories as before in about half the time. Much of this time saving arises from:

  1. not having to visit all of those pages,
  2. skipping the ads (and the associated download time)
  3. some content filtering, and
  4. knowing when to stop.

This last point is critical for me. All I have to remember is what was the youngest story I read last session. When I see that story again, I know that I’m all caught up. Without aggregation, I have to remember what was the last story I read for every site. That’s a lot of mental energy. Aggregation makes it more like an email inbox — one stream of information — and that’s cool.

What are our computers for if not to act as agents for our interests?

Future

Aggregation will only become more significant in the next couple of years. Google is aggregating newspaper sites all over the world. Many open-source developer groups rely on Planet feeds, like PlanetMozilla, to get a quick read on what’s current. As more sources provide feeds, and the feedparsers get their remaining bugs worked out (e.g. encoding differences, xhtml vs. html, etc.), and the authors get better at self-filtering (I do not want to see vacation photos from the Mozilla developers!) then aggregates can come into their prime. By leveraging primary sources, the news can be even fresher than from portals.

The major remaining obstacles are discovery and trust. Discovering new sources is time consuming, because you generally read a lot of material that is of low interest. But relying on others to discover sources for you just leads back to the portal days where you lack control. Like a financial portfolio, I posit that diversity is the key to a good aggregate today. My personal reading list is a mix of primary sources (like Tom Tongue), intermediate aggregates (the Planet sites) and rehashed news moderated by an editor (like MozillaZine). Primary sources are good because you get the news soonest and have the greatest control over what you read. Edited content is good because someone who (presumably) has a brain has read every item before you did and culled the worst of the trolls out. Intermediate aggregates are somewhere in between: they include authors that are usually interesting.

Social networking can help with both the discovery and trust issues. A promising future direction for aggregation tools is the sharing of moderation between friends. For example, I’ve had some success with DaringFireball (recommended to me by Peter Erwin, IIRC), but not enough to add it to my daily rotation. I would be thrilled to have an automated way for friends to send me a “Best of…” for that feed. Future aggregation clients will allow the user to flag the best/worst stories and republish that list for their friends to see. That re-publication would be subject to normal aggregate filtering, of course, so you could just cherry-pick from common interests with friends.

Conclusion

Aggregates are the new portals, offering more control before than ever to users in the way the intake information. I wonder what comes next?

Unicode development under Apache

One of my current projects is to port an application to Japanese. The first port is always the hardest1, so I’ve learned a few things in the process. I’m going to accumulate a few of my successes in this blog category. The first and most significant is that the way encodings work in HTTP/HTML is weird!

Take a peek at this slide from a talk by Sam Ruby, which shows an example HTML page with conflicting metadata. When there are conflicting directives indicating which encoding to use for the document, can you guess which one wins? You may be surprised to learn that the encoding specified in the HTTP Content-Type has precedence over the encoding declared in the HTML file! That is to say, if your HTML document claims

    <meta http-equiv="Content-Type"
              content="text/html; charset=Shift-JIS" />

and Apache says

    Content-Type: text/html; charset=ISO-8859-1

then Apache wins and your Japanese page will be rendered as Latin-1 in the browser and will likely be garbled. Apache’s out of the box configuration often includes a default encoding2 which may or may not be right.

There are two solutions to this problem:

  1. Make Apache ignore encoding
  2. Use exactly one encoding everywhere and always

The latter is good practice, but the former is easier. To make Apache ignore encoding, search your httpd.conf file for any AddDefaultCharset lines and removing them.

In our project, we chose the other route, making the obvious choice to use UTF-8 everywhere. We added this line to Apache:

    AddDefaultCharset UTF-8

and these lines to all HTML and XHTML files, respectively:

    <meta http-equiv="Content-Type"
           content="text/html; charset=utf-8" >
    <meta http-equiv="Content-Type" 
           content="application/xhtml+xml; charset=utf-8" />

Then, the major remaining hurdle was to ensure that all of our development tools actually read and write UTF-8. That will be the subject of a future post.


1 I’ve found this to be universally true for language, hardware, OS, API and other types of porting.

2 Two data points:

  1. Our main webserver runs RedHat, whose Apache had AddDefaultCharset ISO-8859-1
  2. The default Apache configuration under Mac OS X does not include a default character set. Good job Apple!