Sunday, September 20, 2009

Facebook's Tornado

Today I got my first chance to play with Facebook's Python based web server and framework Tornado. The framework is single threaded and asynchronous, similar to how the Twisted framework operates, and Facebook uses the technology to provide data to the 'Friend Feed'.

Python thread creation and maintenance have relatively high overhead, so asynchronous solutions can provide better performance and scalability than more traditional threaded server implementations. This is especially true for servers with high numbers of concurrent connections, and for long-polling and streaming connections that spend most of their time waiting around for a message to be published to a client. Tornado's benchmarks look impressive against threaded Python web servers, but I wonder why they didn't include Twisted, their closest competitor, in the benchmarking.

My first impression is that Tornado is much easier to use than Twisted, but also much less powerful. Twisted is a monolithic package that can be used with many different networking protocols, but Tornado is focused only on http. Tornado provides built in support for templating, authentication, and signed cookies. One really cool feature Tornado provides is semi-automatic XSRF protection, which should save time for developers who would otherwise need to manually implement countermeasures.

The documentation for Tornado is very slim. For example, I couldn't find any information about how to interact directly with Tornado's main event loop (IOLoop). Fortunately, Tornado's code base is small and readable, so reading the source will quickly get you up to speed.

While implementing a Tornado channel for the AmFast project, I ran into a problem that I have also encountered with Twisted. Both Tornado and Twisted use callbacks to complete a request. As an example, if a long-poll client is waiting for a message, you call RequestHandler.finish() to send a response back to the client when a message is published.

The above method works well for a single server instance, but what about when you have multiple servers behind a proxy? A server process may publish a message for a client that is connected to an entirely different server process. The publishing process has no way to notify the subscribing process that a client has received a message.

This problem can be solved by polling a database table or a file that is accessible to all server processes, but that takes system resources, especially if you have many clients and your polling interval is short. One of my future goals is to figure out a more elegant way for the publishing server process to notify the subscribing server process when a client receives a message.

5 comments:

  1. Have you made any progress on implementing a Tornado channel? I am interested in how I integrate Flex with Tornado for a message broker system. AmFast looks as though it can decrease my dependency on knowing Python.

    ReplyDelete
  2. There is a beta implementation of a Tornado channel in the SVN trunk (http://code.google.com/p/amfast/source/checkout).

    The implementation supports the standard Flex messaging API, so it will work as a message broker out-of-the-box.

    Check here: http://code.google.com/p/amfast/wiki/ProducerConsumer and here: http://code.google.com/p/amfast/wiki/RealTimeMessaging for examples.

    ReplyDelete
  3. Does Tornado works with PostgeSQL?

    ReplyDelete
  4. Tornado works with PostgreSQL or any other database with a Python adapter. However it should be noted, that database queries are usually synchronous, and need to somehow be abstracted to support asynchronous queries to work with Tornado. The Tornado team recommends offloading SQL queries to a separate http server to prevent blocking. You could also use the asynchronous features of pyscopg2 (PostgreSQL adapter for Python) to implement asynchronous database queries.

    ReplyDelete