Tornado is an open source web server in Python. It was originally developed to power friendfeed.com, and excels at non-blocking operations for real-time web services.
Tornado includes an HTTP client as well, to fetch files from other servers. I found a number of examples of how to use it, but all of them would fetch the entire item and return it in a callback. I plan to fetch some rather large multi-megabyte files, and don't see a reason to hold them entirely in memory. Here is an example of how to get partial updates as the download progresses: pass in a streaming_callback to the HTTPRequest().
The streaming_callback will be called for each chunk of data from the server. 4 KBytes is a common chunk size. The async_callback will be called when the file has been fully fetched; the response.data will be empty
#!/usr/bin/python import os import tempfile import tornado.httpclient import tornado.ioloop class HttpDownload(object): def __init__(self, url, ioloop): self.ioloop = ioloop self.tempfile = tempfile.NamedTemporaryFile(delete=False) req = tornado.httpclient.HTTPRequest( url = url, streaming_callback = self.streaming_callback) http_client = tornado.httpclient.AsyncHTTPClient() http_client.fetch(req, self.async_callback) def streaming_callback(self, data): self.tempfile.write(data) def async_callback(self, response): self.tempfile.flush() self.tempfile.close() if response.error: print "Failed" os.unlink(self.tempfile.name) else: print("Success: %s" % self.tempfile.name) self.ioloop.stop() def main(): ioloop = tornado.ioloop.IOLoop.instance() dl = HttpDownload("http://codingrelic.geekhold.com/", ioloop) ioloop.start() if __name__ == '__main__': main()
I'm mostly blogging this for my own future use, to be able to find how to do something I remember doing before. There you go, future me.