Tornado is an open source web server in Python. It was originally developed to power friendfeed.com, and excels at non-blocking operations for real-time web services.
Tornado includes an HTTP client as well, to fetch files from other servers. I found a number of examples of how to use it, but all of them would fetch the entire item and return it in a callback. I plan to fetch some rather large multi-megabyte files, and don't see a reason to hold them entirely in memory. Here is an example of how to get partial updates as the download progresses: pass in a streaming_callback to the HTTPRequest().
The streaming_callback will be called for each chunk of data from the server. 4 KBytes is a common chunk size. The async_callback will be called when the file has been fully fetched; the response.data will be empty
#!/usr/bin/python
import os
import tempfile
import tornado.httpclient
import tornado.ioloop
class HttpDownload(object):
def __init__(self, url, ioloop):
self.ioloop = ioloop
self.tempfile = tempfile.NamedTemporaryFile(delete=False)
req = tornado.httpclient.HTTPRequest(
url = url,
streaming_callback = self.streaming_callback)
http_client = tornado.httpclient.AsyncHTTPClient()
http_client.fetch(req, self.async_callback)
def streaming_callback(self, data):
self.tempfile.write(data)
def async_callback(self, response):
self.tempfile.flush()
self.tempfile.close()
if response.error:
print "Failed"
os.unlink(self.tempfile.name)
else:
print("Success: %s" % self.tempfile.name)
self.ioloop.stop()
def main():
ioloop = tornado.ioloop.IOLoop.instance()
dl = HttpDownload("http://codingrelic.geekhold.com/", ioloop)
ioloop.start()
if __name__ == '__main__':
main()
I'm mostly blogging this for my own future use, to be able to find how to do something I remember doing before. There you go, future me.