Monday, December 28, 2009

Crowdsourcing Backup

Broken hard driveJeff Atwood recently suffered a catastrophic loss of data of his long-running blog Coding Horror. The site was running on a virtual machine, and apparently VM backups at the hosting provider had been routinely failing for years without anybody noticing. Jeff maintained his own backups... within the VM itself, which were lost when the VM was lost. Jeff's story has a happy ending as one of his readers, Carmine Paolino, had a complete archive.

Obviously the happenstance of somebody on the Internet having a complete copy of data important to us does not constitute a practical backup strategy, but it got me to thinking about the idea of crowdsourcing backups. Everybody should have offsite backups, but practically nobody does it. Could a system be designed where each participant wanting to back up their most important data would in return offer a chunk of local disk space to use for storing data for other people?

With terabyte drives becoming common, it seems like many systems have an abundance of disk space which could be better taken advantage of. Perhaps the data you want to be backed up can be broken into chunks and stored in the free space of a number of other backup users, while your drive simultaneously stores their data.

  • Your data would have to be encrypted, as it will be stored on media controlled by random and potentially untrustworthy people.
  • A large amount of redundancy would have to be baked in, as people could drop out of the system at any time and take a chunk of stored information away. Many copies of each chunk would be stored in multiple places.
  • Forward Error Correction would also be good, to further improve survivability in the face of missing data. Recovering most of the chunks would be sufficient to reconstruct the rest.

The practicality of the details aside, with Amazon, RackSpace and others offering cloud storage options, would it even be worthwhile to construct such a crowdsourced system? In 2010, I'm not sure that it is. I suspect this is an idea whose time has come... and gone.