Sunday, October 23, 2011

Tornado HTTPClient Chunked Downloads

Tornado is an open source web server in Python. It was originally developed to power friendfeed.com, and excels at non-blocking operations for real-time web services.

Tornado includes an HTTP client as well, to fetch files from other servers. I found a number of examples of how to use it, but all of them would fetch the entire item and return it in a callback. I plan to fetch some rather large multi-megabyte files, and don't see a reason to hold them entirely in memory. Here is an example of how to get partial updates as the download progresses: pass in a streaming_callback to the HTTPRequest().

The streaming_callback will be called for each chunk of data from the server. 4 KBytes is a common chunk size. The async_callback will be called when the file has been fully fetched; the response.data will be empty

#!/usr/bin/python

import os
import tempfile
import tornado.httpclient
import tornado.ioloop

class HttpDownload(object):
  def __init__(self, url, ioloop):
    self.ioloop = ioloop
    self.tempfile = tempfile.NamedTemporaryFile(delete=False)
    req = tornado.httpclient.HTTPRequest(
        url = url,
        streaming_callback = self.streaming_callback)
    http_client = tornado.httpclient.AsyncHTTPClient()
    http_client.fetch(req, self.async_callback)

  def streaming_callback(self, data):
    self.tempfile.write(data)

  def async_callback(self, response):
    self.tempfile.flush()
    self.tempfile.close()
    if response.error:
      print "Failed"
      os.unlink(self.tempfile.name)
    else:
      print("Success: %s" % self.tempfile.name)
      self.ioloop.stop()

def main():
  ioloop = tornado.ioloop.IOLoop.instance()
  dl = HttpDownload("http://codingrelic.geekhold.com/", ioloop)
  ioloop.start()

if __name__ == '__main__':
  main()

I'm mostly blogging this for my own future use, to be able to find how to do something I remember doing before. There you go, future me.