Aggregation

Then

About ten years ago, Yahoo appeared on the net. It was so very cool because it was a central place where you could find anything you were looking for. By being popular itself, Yahoo decided what on the web was cool and what was not. Aside from the original NCSA list of all web pages (!), it was the portal site I ever saw. Since then, portals have risen and fallen (e.g. Lycos, Netscape, Altavista). Today, Google released a beta of its new portalized. It’s nice that have plans to let you uber-customize it, but right now it’s quite ho-hum.

The problem with portals is that they decide what’s important and what’s not. If your interests coincide with the portal authors, then you are well served. If you are not a perfect match, you will have to sift through much chaff to reach the kernel of information you seek. Some portals have tried to offer some degree of personalization. For example, Slashdot offers pre-defined portals for broad niches (e.g. apple.slashdot.org) as well as the ability to filter out unwanted topics and authors (e.g. JonKatz). That goes a long way towards satisfying the needs of the tech geeks, but it still relies on the much-maligned Slashdot editors to make the initial content selections from which readers may filter.

To get a full serving of content, the typical avid reader would hit his or her 10+ sites on a regular basis to keep up on the latest.

Now

Having to hit more than 10 sites just to see if they have anything new is tedious. Tabbed browsing and bookmarked tab groups does help cut down the overhead, but its still overhead. Why not put the computer to work to do some of this search work for us?

Enter aggregation.

Aggregation tools build your own personal portal for you by periodically querying the sources that interest you and pulling the prime content into a digestible form. The basic tools to permit this have been around for a while, but only in the last year has it really come together. The important bit was that the content producers out there had to standardize on a computer readable structure for their offerings. Thus, we now see RSS, RDF, Atom, and OPML links all over the place, collectively called feeds. These are XML-based file formats to collect abstracts of web-based content. The small files are fast to download and easy to process, so it’s simple to write software to read them. This simplicity created an explosion of parses, which then snowballed into a further explosion of feed offerings.

The next step is to accumulate all of these feeds (and perhaps sort and filter them along the way) into a body of information that the user can absorb. This aggregation process is the real energy saver. Popular tools include client-side solutions like NetNewsWire, Sage (a Firefox extension) and now Safari RSS. On the server side, aggregators that produce concise HTML include PlanetPlanet (my personal favorite), Bloglines and Radio Userland.

In my personal experience, my aggregation has led to me being able to absorb about four times as many news stories as before in about half the time. Much of this time saving arises from:

  1. not having to visit all of those pages,
  2. skipping the ads (and the associated download time)
  3. some content filtering, and
  4. knowing when to stop.

This last point is critical for me. All I have to remember is what was the youngest story I read last session. When I see that story again, I know that I’m all caught up. Without aggregation, I have to remember what was the last story I read for every site. That’s a lot of mental energy. Aggregation makes it more like an email inbox — one stream of information — and that’s cool.

What are our computers for if not to act as agents for our interests?

Future

Aggregation will only become more significant in the next couple of years. Google is aggregating newspaper sites all over the world. Many open-source developer groups rely on Planet feeds, like PlanetMozilla, to get a quick read on what’s current. As more sources provide feeds, and the feedparsers get their remaining bugs worked out (e.g. encoding differences, xhtml vs. html, etc.), and the authors get better at self-filtering (I do not want to see vacation photos from the Mozilla developers!) then aggregates can come into their prime. By leveraging primary sources, the news can be even fresher than from portals.

The major remaining obstacles are discovery and trust. Discovering new sources is time consuming, because you generally read a lot of material that is of low interest. But relying on others to discover sources for you just leads back to the portal days where you lack control. Like a financial portfolio, I posit that diversity is the key to a good aggregate today. My personal reading list is a mix of primary sources (like Tom Tongue), intermediate aggregates (the Planet sites) and rehashed news moderated by an editor (like MozillaZine). Primary sources are good because you get the news soonest and have the greatest control over what you read. Edited content is good because someone who (presumably) has a brain has read every item before you did and culled the worst of the trolls out. Intermediate aggregates are somewhere in between: they include authors that are usually interesting.

Social networking can help with both the discovery and trust issues. A promising future direction for aggregation tools is the sharing of moderation between friends. For example, I’ve had some success with DaringFireball (recommended to me by Peter Erwin, IIRC), but not enough to add it to my daily rotation. I would be thrilled to have an automated way for friends to send me a “Best of…” for that feed. Future aggregation clients will allow the user to flag the best/worst stories and republish that list for their friends to see. That re-publication would be subject to normal aggregate filtering, of course, so you could just cherry-pick from common interests with friends.

Conclusion

Aggregates are the new portals, offering more control before than ever to users in the way the intake information. I wonder what comes next?

Ideas and tools to improve programming throughput.