Monday, December 28, 2009

Crowdsourcing Backup

Broken hard driveJeff Atwood recently suffered a catastrophic loss of data of his long-running blog Coding Horror. The site was running on a virtual machine, and apparently VM backups at the hosting provider had been routinely failing for years without anybody noticing. Jeff maintained his own backups... within the VM itself, which were lost when the VM was lost. Jeff's story has a happy ending as one of his readers, Carmine Paolino, had a complete archive.

Obviously the happenstance of somebody on the Internet having a complete copy of data important to us does not constitute a practical backup strategy, but it got me to thinking about the idea of crowdsourcing backups. Everybody should have offsite backups, but practically nobody does it. Could a system be designed where each participant wanting to back up their most important data would in return offer a chunk of local disk space to use for storing data for other people?

With terabyte drives becoming common, it seems like many systems have an abundance of disk space which could be better taken advantage of. Perhaps the data you want to be backed up can be broken into chunks and stored in the free space of a number of other backup users, while your drive simultaneously stores their data.

  • Your data would have to be encrypted, as it will be stored on media controlled by random and potentially untrustworthy people.
  • A large amount of redundancy would have to be baked in, as people could drop out of the system at any time and take a chunk of stored information away. Many copies of each chunk would be stored in multiple places.
  • Forward Error Correction would also be good, to further improve survivability in the face of missing data. Recovering most of the chunks would be sufficient to reconstruct the rest.

The practicality of the details aside, with Amazon, RackSpace and others offering cloud storage options, would it even be worthwhile to construct such a crowdsourced system? In 2010, I'm not sure that it is. I suspect this is an idea whose time has come... and gone.

Wednesday, December 23, 2009

Satellites Should Respond to My Whims

There was a pretty spectacular accident in Jamaica last night, where a 737 skidded off the runway and broke into pieces. Check the picture in the linked story, I'll wait. Amazingly only two passengers were injured.

Surely I'm not the only person who immediately checked satellite imagery, on the off chance that maybe, just maybe the periodic flyover happened to be this morning. Alas, no.

View Larger Map

Monday, December 21, 2009

North Pole Compression Algorithm

Santa floppy disk ornament

Note the lack of the "HD" logo on the dust cover? Santa must have remarkable compression technology to fit the entire 1998 naughty/nice list on an 800k disk. The prevalence of popular baby names from year to year probably helps, there is a lot of duplication.

Wednesday, December 16, 2009


Slashdenfreude [slash-den-froi-duh] (noun) : To take joy in the slashdotting of others.

Monday, December 14, 2009

A Coroutine, Thread, and Semaphore Walk into a Bar...

This article about multicore programming techniques is pure comedic gold.

In particular, threads suffer badly from 'race conditions'. The race of despised worker threads is made to do boring, low status, 'background' tasks. Meanwhile, the high privilege 'system' threads get to party with the hardware. It's the same the whole world over.

It is a great read with that peculiar British humor humour which The Register is so good at. It is also a good technical overview of techniques for taking advantage of multiple cores.

Thursday, December 10, 2009

Untouchable Code

Behold: the Blaupunkt CD50.

Perhaps "behold" is too pretentious for a basic car stereo, but it is the topic of today's screed so I feel a dramatic introduction is called for. Let me call your attention to four buttons on the left side: RDS, AM, FM, and CD-C. They do what you might expect:

  • RDS - enable decode of station and song identification from an FM signal. I'm not sure why you'd ever disable this.
  • FM - switch to FM radio.
  • AM - switch to AM radio.
  • CD-C - switch to CD Changer mode. Once in CD-C mode, the RDS/FM/AM buttons have no effect until you push CD-C to get back to Radio mode.

This makes perfect sense, right? I mean really, once I'm in the CD player mode I wouldn't expect the buttons from Radio mode to do anything, would I? Yes, this is sarcasm. On the web. Dangerous, I know.

I'd speculate that the fine engineers at Blaupunkt did not actually want the user interface to be this way, and that they would have preferred the FM button to always switch to FM radio. I suspect they were presented with an existing AM/FM radio design which, for whatever reason, they could not modify. Perhaps the CD50 project had a very tight budget, or a narrow market window which didn't allow time to tweak the radio components. Less charitably, perhaps the radio design had degenerated into an unmaintainable mess and any change risked breaking the whole thing. The path of least resistance to get the product out is a mux: you're either in our new mode where we add all the shiny new goodness, or the crufty old mode where we haven't touched anything from the existing design.

The situation of an unmaintainable portion of a system should be familiar to any software developer tasked with working on a large codebase. I suspect the natural entropic state of software is unmaintainability, requiring constant infusions of energy to stave it off a while longer.

So what can we do to ensure systems remain maintainable? Unit testing is frequently suggested as an answer, though I've never been a fan of extensive unit testing. If the target platform is very different from the build system, structuring the code to be able to run unit tests is a non-trivial amount of extra work. However I've recently started working in an environment where development testing is strongly encouraged, and I have to admit it does help in keeping code maintainable as developers come and go. The lowest level unit tests are not terribly useful in this regard; even code in a complete mess will have unit tests. On the other hand, a functional testbench for a module where the interfaces to the rest of the system are mocked out is very helpful. You have a much firmer grasp of how changes you make in the module are going to impact the rest of the system. You also have more hope of being able to reimplement the module, as its interfaces are described by the mock framework. If other portions of the system reach in to the internals of the module without using the interfaces... then the cancer has already metastasized and you're probably doomed.

People say that refactoring early and often will keep code maintainable, but that requires agreement from management to spend development time paying off technical debt without an increase in marketable features. In a product environment I rarely win that argument. However, I've now worked on a complete re-implementation of two different systems where the bug load had simply become impossible due to indecipherable engineering. I suppose that is an extreme form of refactoring: extract the best bits of the old system, and throw out the rest.

Do you have tips for keeping code maintainable across multiple generations of products? This site uses Disqus for comments. You can comment anonymously if you wish, or use an existing identity like Twitter, Facebook, or any OpenID provider.

Monday, December 7, 2009

Ruminations on Nickels and Dimes

Nickels & DimesIn 1st grade I could not see how a dime could possibly be worth more than a nickel. The nickel was bigger, after all; it should obviously be worth more.

By 5th grade I realized the dime was more valuable because it was made of a more valuable metal. So even though it was smaller, its total worth was greater than the nickel.

It took until high school to figure out that both the dime and nickel are made of completely worthless metals. The dime is worth more because the US Treasury says it is worth more.

Thursday, December 3, 2009

Memory Matters

PowerPCI once worked on a system where one module was developed outside using Linux/x86 systems, brought in-house, and compiled for Linux/PowerPC. We thought we had been careful in the specifications: avoid endianness assumptions, limit memory footprint, and assume a hefty derating for the slower PowerPC used in the real system. Things looked good in initial testing, but when we started internal dogfooding the PowerPC performance dropped off the proverbial cliff. An operation that took 100 msec on the x86 development system and 300 msec during initial PowerPC testing regressed to an astonishing 45 seconds in the dogfood deployment.

The cause of this disparity was the data cache. For reasons unclear this code iterated through its configuration many, many times. On x86 the various levels of D$ comprise several megabytes, but the PowerPC had only 16K. As the dogfooding progressed and the config grew it resulted in unbelievable cache thrashing and a 2.5 order of magnitude performance drop.

Several years ago Ulrich Drepper wrote an excellent paper about all things related to memory in modern system architectures, especially x86 but relevant everywhere. It is a long read, but very worthwhile. The complete paper is available as a PDF from his site, and it was also serialized in articles on LWN.

  1. Introduction
  2. CPU caches
  3. Virtual memory
  4. NUMA systems - local versus remote references
  5. What programmers can do - cache optimization
  6. What programmers can do - multi-threaded optimizations
  7. Memory performance tools
  8. Future technologies
  9. Appendices and bibliography

I downloaded the PDF and read it over the course of a few weeks. I strongly recommend this paper, the information content is very high.

Tuesday, December 1, 2009

USPS and Red Tape

I recently mailed a package using a Pitney Bowes postage meter. After calculating the postage there is a screen listing restrictions of items which can not be mailed through the US postal system. I suspect most people just click through without reading it, which is a shame. It is a fascinating read, and is reproduced here for your edification and bemusement.

Harmful matter includes, but is not limited to:
a. All types and classes of poisons, including controlled substances.
b. All poisonous animals except scorpions mailed for medical research purposes or for the manufacture of antivenom; all poisonous insects; all poisonous reptiles; and all types of snakes, turtles, and spiders.
c. All disease germs or scabs.
d. All explosives, flammable material, infernal machines, and mechanical, chemical, or other devices or compositions that may ignite or explode.
Hazardous items includes materials such as caustic poisons (acids and alkalies), oxidizers, or highly flammable liquids, gases, or solids; or materials that are likely, under conditions incident to transportation, to cause fires through friction, absorption of moisture, or spontaneous chemical changes or from retained heat from manufacturing or processing, including explosives or containers previously used for shipping high explosives with a liquid ingredient (such as dynamite), ammunition, fireworks, radioactive materials, matches, or articles emitting obnoxious odors.

This is great stuff. I have several observations.

Cube from the movie HellraiserNote the "infernal machines" phrase in section (d). What is an infernal machine? The Pitney-Bowes restrictions appear to come directly from the US Postal Service Domestic Mail Manual C021, but the phrase is not subsequently defined there. Is it something like the puzzle boxes from Hellraiser? I can see why we might not want those to be spread around...

I had no idea the scorpion industry lobby was so powerful. In fact, I had no idea that there was a scorpion industry nor that they had a lobbyist, but they scored their own exception in section (b). Poisonous snakes need not apply, only scorpions can be mailed for medical research purposes.

In the final paragraph it explains that you're not allowed to ship explosives under any circumstances. Also, you need to ensure that containers which you previously used to ship dynamite have been cleaned of any residue. I'm sure how you could possibly have containers which were previously used to ship dynamite, if you're never allowed to ship dynamite.

Finally, in addition to the rules against shipping materials which could kill or maim note that you're not allowed to ship anything smelly or stinky. That would be gross.