Coding Relic: Tornado HTTPClient Chunked Downloads

Sunday, October 23, 2011

Tornado HTTPClient Chunked Downloads

Tornado is an open source web server in Python. It was originally developed to power friendfeed.com, and excels at non-blocking operations for real-time web services.

Tornado includes an HTTP client as well, to fetch files from other servers. I found a number of examples of how to use it, but all of them would fetch the entire item and return it in a callback. I plan to fetch some rather large multi-megabyte files, and don't see a reason to hold them entirely in memory. Here is an example of how to get partial updates as the download progresses: pass in a streaming_callback to the HTTPRequest().

The streaming_callback will be called for each chunk of data from the server. 4 KBytes is a common chunk size. The async_callback will be called when the file has been fully fetched; the response.data will be empty

#!/usr/bin/python

import os
import tempfile
import tornado.httpclient
import tornado.ioloop

class HttpDownload(object):
  def __init__(self, url, ioloop):
    self.ioloop = ioloop
    self.tempfile = tempfile.NamedTemporaryFile(delete=False)
    req = tornado.httpclient.HTTPRequest(
        url = url,
        streaming_callback = self.streaming_callback)
    http_client = tornado.httpclient.AsyncHTTPClient()
    http_client.fetch(req, self.async_callback)

  def streaming_callback(self, data):
    self.tempfile.write(data)

  def async_callback(self, response):
    self.tempfile.flush()
    self.tempfile.close()
    if response.error:
      print "Failed"
      os.unlink(self.tempfile.name)
    else:
      print("Success: %s" % self.tempfile.name)
      self.ioloop.stop()

def main():
  ioloop = tornado.ioloop.IOLoop.instance()
  dl = HttpDownload("http://codingrelic.geekhold.com/", ioloop)
  ioloop.start()

if __name__ == '__main__':
  main()

I'm mostly blogging this for my own future use, to be able to find how to do something I remember doing before. There you go, future me.