Sunday, October 23, 2011

Tornado HTTPClient Chunked Downloads

Tornado is an open source web server in Python. It was originally developed to power, and excels at non-blocking operations for real-time web services.

Tornado includes an HTTP client as well, to fetch files from other servers. I found a number of examples of how to use it, but all of them would fetch the entire item and return it in a callback. I plan to fetch some rather large multi-megabyte files, and don't see a reason to hold them entirely in memory. Here is an example of how to get partial updates as the download progresses: pass in a streaming_callback to the HTTPRequest().

The streaming_callback will be called for each chunk of data from the server. 4 KBytes is a common chunk size. The async_callback will be called when the file has been fully fetched; the will be empty


import os
import tempfile
import tornado.httpclient
import tornado.ioloop

class HttpDownload(object):
  def __init__(self, url, ioloop):
    self.ioloop = ioloop
    self.tempfile = tempfile.NamedTemporaryFile(delete=False)
    req = tornado.httpclient.HTTPRequest(
        url = url,
        streaming_callback = self.streaming_callback)
    http_client = tornado.httpclient.AsyncHTTPClient()
    http_client.fetch(req, self.async_callback)

  def streaming_callback(self, data):

  def async_callback(self, response):
    if response.error:
      print "Failed"
      print("Success: %s" %

def main():
  ioloop = tornado.ioloop.IOLoop.instance()
  dl = HttpDownload("", ioloop)

if __name__ == '__main__':

I'm mostly blogging this for my own future use, to be able to find how to do something I remember doing before. There you go, future me.