A Faster Feed Apart

Posted on April 30th, 2010.

An Event Apart is a conference for web developers and designers that happens a few times a year in various cities. A Feed Apart is a site that aggregates tweets during each conference and displays them in a live stream so attendees can follow them during the conference, and people not attending can see what the attendees are talking about.

A Feed Apart was originally written by Nick Sergeant and Pete Karl of Lion Burger during one of the conferences. Since then a lot of people have used it and love it.

The current A Feed Apart site is not without its problems. It was written in a single night so it's not perfect. During the last conference it went down several times and lost some tweets along the way.

I work at Dumbwaiter Design with Nick and one day he mentioned that it would be cool if we rewrote A Feed Apart from the ground up. He's learned a lot about how people use the site and what the big problems are, so we'd have a better idea of what we need to accomplish.

Nick, myself and Ali Ali have risen to the challenge and started rewriting A Feed Apart. Ali is a designer and is taking care of the design of the new site. Nick is handling all the frontend HTML, CSS and Javascript. My job is the backend.

Ali posted a blog entry about the design of the new version so I figured I'd write about the backend to show how I'm trying to improve it. If you have any advice I'd love to hear it — the next An Event Apart conference is about three weeks away so there's still time to add improvements.

  1. How AFA is Used
    1. People Attending the Conference
    2. People That Wish They Were Attending the Conference
    3. Speakers and Organizers
  2. Goals
  3. Staying Sane
  4. Being Real Time
    1. Retrieving Updates
    2. Storing Updates
    3. Sending Updates to Users
  5. Organizing the Conversation
  6. Staying Consistent
  7. Getting Bigger
  8. A Work in Progress

How AFA is Used

You can't hope to improve on a site unless you know how people are going to use it. AFA has been around for a while and Nick has learned a lot about what people want to get out of it.

There are three main kinds of users of AFA:

People Attending the Conference

People that attend AEA are web developers and designers. They're up-to-date on the latest technology and almost all of them use Twitter.

During the conference you'll see a sea of laptops in the audience. Attendees will tweet about whatever is happening right now. There's a huge amount of conversation that happens between attendees that AFA is trying to collect and present.

A simple example is one that Nick mentioned to me: if a presenter mentions a website during her presentation someone will tweet the URL so other people can have a link to easily click on. Attendees will also tweet their agreement or disagreement with what the presenter is saying as they're speaking.

The most important aspect of AFA for this group of people is the real time conversation. If AFA doesn't show tweets until a minute after they've been posted it's useless to these people.

Another important part of AFA for attendees is that it's a one-stop-shop for the conversation behind AEA. Switching between windows/tabs/applications to read and contribute is annoying.

People That Wish They Were Attending the Conference

The next group of users is those that can't attend the conference for a variety of reasons but are still interested in what's going on.

For this group the real time nature of AFA is unimportant. They're probably not going to be following the conversation 24/7 so if a tweet takes a few minutes to display it's not a big deal.

What these people do care about is the integrity of the stream. They don't want to miss any part of the conversation. If AFA loses tweets they're better off doing a simple search on Twitter because they'll get the whole story.

Speakers and Organizers

The last main group of users is the speakers and organizers of AEA.

Any speaker worth their salt will want to know what people are said about their presentation. Obviously they can't be reading AFA while they're presenting, but afterword they'll certainly want to go back and see what people were saying during their specific presentation.

Likewise, organizers want to know which presentations people liked and which ones people didn't enjoy. This could easily influence who they choose to invite to future conferences.

For this group the most important aspect of AFA is the organization of the conversation into chunks, each of which applies to a single presentation. By reading through each chunk of conversation they can get an idea of the general response to each presentation.

Goals

Once we identified the main users of AFA we were able to come up with the goals for the new version of the site:

Staying Sane

If the three of us tried to create the site with nothing more than a couple of laptops and a few chats in person we'd go crazy. We use a few tools to help us manage the development of the site.

To create wireframes of the design we're using a free account at Hot Gloo. Hot Gloo is a great tool that lets us quickly sketch out ideas and comment on them.

To share design comps we're using Dropbox. It's simple to set up a shared Dropbox folder and Nick and I can get real time updates when Ali makes changes to the design.

To work together on the code Nick and I use Mercurial repositories. Mercurial lets us work on the same code bases simultaneously and we almost never have to worry about merging. We use Codebase for hosting and issue tracking.

Being Real Time

The previous version of AFA wasn't truly real time. When you went to the site your browser would ask AFA for the newest updates every 10 seconds.

There are two main problems with this approach. The first is that it's not really real time. I've noticed this being an issue in my own experience.

When I watch any of the various "live streams" of Apple press conferences I'm usually at work where other people are also watching. We very rarely load the pages of the streaming sites like Gizmodo at the exact same second, so our browsers will be out of sync with each other. Nick might get an update that I would have to wait 8 more seconds to see.

When I glance over at his screen and see an update that I don't have I instinctively refresh the page to get it. This defeats the purpose of the "live updating" code that the developers of these sites worked on. They may as well have just made a static page and told me to refresh.

The second problem is that querying every 10 seconds can be taxing on the site's database. We're doing this as a side project so we don't have unlimited funding for a hefty database server. If 1,000 people are querying for updates every 10 seconds that's 100 requests per second to the database. This means we need to have some kind of in-memory cacheing if we want the site to feel snappy on modest hardware.

We want the new AFA to be truly real time. To do this we need to use long polling by users' browsers to wait for updates and return them as soon as they come in. We also need to retrieve the updates as fast as we possibly can, and they need to be stored in memory to avoid hitting the database constantly.

Retrieving Updates

The bulk of the conversation about AEA conferences comes from Twitter. Tweets are the most important items that we need to display on the site, so we're using Twitter's streaming API to pull them in.

Since we don't want to tie up an entire server to pull in tweets I've decided to create a Diesel-based application called The Nozzletron to parse the streaming API. Diesel is a Python framework that takes an elegant approach to asynchronous communication with clients and servers.

Twitter's streaming API accepts HTTP requests and returns "chunked" responses, each of which is a tweet. Unfortunately the Diesel's built-in HTTP client doesn't handle chunked HTTP responses so I had to write some code to handle them myself.

The Nozzletron will connect to Twitter's streaming API and wait for data to come in. If there's no data to process it will relinquish the server's processor so it can do other things. Processing a single tweet that comes in doesn't take much time, so the server is free to do other things most of the time.

I've also created another application called The Flickrtron to pull in Flickr photos. Unfortunately Flickr doesn't have a streaming API like Twitter so I have to resort to polling Flickr's API every few minutes for new photos. Flickr is much less of a real time medium than Twitter though, so I don't think this is a very big problem.

I'm using a Python library called Beej's Flickr API to talk to Flickr. It is horrible. It calls itself an "API" but is really just a thin wrapper around calls to the Flickr API. The objects it returns for API calls are Elementtree objects representing the XML of the response.

I wish I could do something like:

thumb_url = photo.thumbnail_url

Instead I have to use this monstrosity:

thumb_url = "http://farm%s.static.flickr.com/%s/%s_%s_s.jpg" % (
    photo.get('farm'), photo.get('server'), photo.get('id'),
    photo.get('secret'),
)

I wish there were a better Python Flickr API out there but there doesn't seem to be one. If I've missed it please let me know!

Storing Updates

We need a fast way to store and retrieve updates so AFA can provide a real time view of the conversation happening at AEA. With this in mind I've chosen Redis to store the updates of the currently-happening event.

Redis is an easy-to-set-up, disgustingly-fast, in-memory data store. It's similar to memcached but has more intelligent data structures that make my life easier for this project.

As updates (tweets and flickr photos) are scraped by The Nozzletron and The Flickrtron they're placed at the tail end of a Redis list. That means I can quickly and easily get all the items since item N when a user's browser requests them with a single LRANGE {list-key} N -1 call.

I'm also keeping a few other pieces of information in Redis. For example, there's a set of photo IDs held in the {item-key}:flickrtron:grabbed_photos key that keeps track of all of the photos we've already seen.

This makes it easy to tell if we've already seen a photo (and therefore don't need to query Flickr for more information) — it's a simple SISMEMBER {item-key}:flickrtron:grabbed_photos {photo-id} call.

I'm also using Redis to store statistics about the site like:

This kind of information will be extremely valuable in the future when we're planning improvements to the site. Redis makes it fast and safe to update this information using the INCRBY and DECR commands.

There's one more component to The Nozzletron and The Flickrton that I haven't mentioned. Both use Redis' PUBLISH command to push new updates out to users' browsers as they arrive. I'll talk more about that in the next section.

Sending Updates to Users

As I mentioned before we want to send updates to users as soon as they're received. To do this I've created another Diesel-based application called Halley to handle this Comet-style communication.

Halley has a few components. The first uses Diesel's Redis API to subscribe to a Redis channel like live:{event-id}:items and fire off messages whenever something new comes in. As soon as a new update comes in from The Nozzletron or The Flickrtron all of Halley's clients will get it.

When I started working on AFA Diesel's Redis client didn't support the very new PUBLISH/SUBSCRIBE/UNSUBSCRIBE commands. I implemented them, put my changes on BitBucket, sent a pull request, and started talking to the Diesel crew on IRC.

One of the maintainers pulled my patches and made them even better. Now Diesel's Redis client has full PUBLISH/SUBSCRIBE/UNSUBSCRIBE support. It's a great example of how open source projects can produce awesome results.

The other main component of Halley is an HTTP server that listens for connections from browsers. The Javascript on the site will call Halley and say "I need updates since number X", where X is the length of the Redis list of updates at the last time it spoke to the server.

Halley uses Diesel's HTTP Server to manage these requests.

If a client is asking for everything since X and X is a smaller number than the current number of items it will return the updates that have happened since then. This might happen if we return an update and then two more updates happen before the browser gets around to sending another request.

If a client is asking for everything since X and X is equal to the current number of items Halley will wait for a new message to be fired from the Loop that's watching the Redis channel. While Halley waits she relinquishes the processor to the server so other requests can be handled.

There's a bit of code to prevent DoS attacks that request every item in the queue over and over again, of course.

Organizing the Conversation

The live stream is an important component of AFA, but it's not the only one. We also need to organize updates into logical chunks by event and presentation, and provide archives of old events so people can see what happened.

The main AFA site is built with Django and served with Gunicorn and Nginx. It uses a Postgresql database to store data that's not "live". Because queries for live data are handled with Diesel and Redis we don't need to send those request through the full Django/Postgresql stack. Django and Postgresql are only involved when you load a fresh page, and they're more than capable of handling the amount of traffic that AFA gets for those kind of requests.

I've created an application called The Strainer to copy data from the live stream to the Postgresql database. The Strainer looks at the list of live items in Redis, parses those items into Django models and saves them to the Postgresql database.

Using two different types of stores (Redis and Postgresql) means we can get the best of both worlds for AFA:

Django's models and managers make it very easy to separate updates out into the various sessions and presentations that happen at the conferences.

Sessions and presentations are much less in-demand than the live stream so we can take advantage of Django's abstractions without worrying about the extra memory/CPU usage we incur by doing so.

Staying Consistent

Another problem AFA has faced in the past is losing tweets. No code is perfect (and mine certainly is not) so we need to anticipate that some of the applications we're using will crash at some point.

I'm using Supervisord to monitor the various processes of AFA on the server. If a process crashes for some reason it will be restarted automatically.

Supervisord also has a wonderful Python API, so I've created a simple Dashboard view in the Django site that lets us stop/start/restart each individual process with a simple web interface. The dashboard also shows us the current memory usage of the server and some other statistics so we can monitor how things are working through a web browser (instead of SSH'ing into the server, which is a pain on a phone).

Getting Bigger

An Event Apart is a large event, but it's not a huge event. Despite this I've been trying to build the backend in a way that can be easily scaled.

Right now it's hosted on a single server, but each of the individual components could be moved to a separate server with less than an hour of work each:

Moving Django/Gunicorn, Nginx, Redis, Halley, and Postgresql to dedicated servers would increase the performance of the site immensely. I can't imagine an event that would provide enough traffic to require more than that.

Even if there were an event that needed that kind of throughput, we could easily split the single Halley and Redis servers into multiple load-balanced servers.

A Work in Progress

This new version of A Feed Apart is still being built. I'm learning new things every time I work on it, and I'm sure there's still room for improvement.

If you have any questions, advice, or want me to go more in-depth about a specific aspect of the site's backend please let me know!