Monday, December 28, 2009

Crowdsourcing Backup

Jeff Atwood recently suffered a catastrophic loss of data of his long-running blog Coding Horror. The site was running on a virtual machine, and apparently VM backups at the hosting provider had been routinely failing for years without anybody noticing. Jeff maintained his own backups... within the VM itself, which were lost when the VM was lost. Jeff's story has a happy ending as one of his readers, Carmine Paolino, had a complete archive.


Obviously the happenstance of somebody on the Internet having a complete copy of data important to us does not constitute a practical backup strategy, but it got me to thinking about the idea of crowdsourcing backups. Everybody should have offsite backups, but practically nobody does it. Could a system be designed where each participant wanting to back up their most important data would in return offer a chunk of local disk space to use for storing data for other people?


With terabyte drives becoming common, it seems like many systems have an abundance of disk space which could be better taken advantage of. Perhaps the data you want to be backed up can be broken into chunks and stored in the free space of a number of other backup users, while your drive simultaneously stores their data.


  • Your data would have to be encrypted, as it will be stored on media controlled by random and potentially untrustworthy people.
  • A large amount of redundancy would have to be baked in, as people could drop out of the system at any time and take a chunk of stored information away. Many copies of each chunk would be stored in multiple places.
  • Forward Error Correction would also be good, to further improve survivability in the face of missing data. Recovering most of the chunks would be sufficient to reconstruct the rest.

The practicality of the details aside, with Amazon, RackSpace and others offering cloud storage options, would it even be worthwhile to construct such a crowdsourced system? In 2010, I'm not sure that it is. I suspect this is an idea whose time has come... and gone.

Wednesday, December 23, 2009

Satellites Should Respond to My Whims

There was a pretty spectacular accident in Jamaica last night, where a 737 skidded off the runway and broke into pieces. Check the picture in the linked story, I'll wait. Amazingly only two passengers were injured.


Surely I'm not the only person who immediately checked satellite imagery, on the off chance that maybe, just maybe the periodic flyover happened to be this morning. Alas, no.



View Larger Map

Monday, December 21, 2009

North Pole Compression Algorithm

Santa floppy disk ornament

Note the lack of the "HD" logo on the dust cover? Santa must have remarkable compression technology to fit the entire 1998 naughty/nice list on an 800k disk. The prevalence of popular baby names from year to year probably helps, there is a lot of duplication.

Wednesday, December 16, 2009

Slashdenfreude

Slashdenfreude [slash-den-froi-duh] (noun) : To take joy in the slashdotting of others.

Monday, December 14, 2009

A Coroutine, Thread, and Semaphore Walk into a Bar...

This article about multicore programming techniques is pure comedic gold.

In particular, threads suffer badly from 'race conditions'. The race of despised worker threads is made to do boring, low status, 'background' tasks. Meanwhile, the high privilege 'system' threads get to party with the hardware. It's the same the whole world over.

It is a great read with that peculiar British humor humour which The Register is so good at. It is also a good technical overview of techniques for taking advantage of multiple cores.

Thursday, December 10, 2009

Untouchable Code

Behold: the Blaupunkt CD50.

Perhaps "behold" is too pretentious for a basic car stereo, but it is the topic of today's screed so I feel a dramatic introduction is called for. Let me call your attention to four buttons on the left side: RDS, AM, FM, and CD-C. They do what you might expect:

  • RDS - enable decode of station and song identification from an FM signal. I'm not sure why you'd ever disable this.
  • FM - switch to FM radio.
  • AM - switch to AM radio.
  • CD-C - switch to CD Changer mode. Once in CD-C mode, the RDS/FM/AM buttons have no effect until you push CD-C to get back to Radio mode.

This makes perfect sense, right? I mean really, once I'm in the CD player mode I wouldn't expect the buttons from Radio mode to do anything, would I? Yes, this is sarcasm. On the web. Dangerous, I know.

I'd speculate that the fine engineers at Blaupunkt did not actually want the user interface to be this way, and that they would have preferred the FM button to always switch to FM radio. I suspect they were presented with an existing AM/FM radio design which, for whatever reason, they could not modify. Perhaps the CD50 project had a very tight budget, or a narrow market window which didn't allow time to tweak the radio components. Less charitably, perhaps the radio design had degenerated into an unmaintainable mess and any change risked breaking the whole thing. The path of least resistance to get the product out is a mux: you're either in our new mode where we add all the shiny new goodness, or the crufty old mode where we haven't touched anything from the existing design.

The situation of an unmaintainable portion of a system should be familiar to any software developer tasked with working on a large codebase. I suspect the natural entropic state of software is unmaintainability, requiring constant infusions of energy to stave it off a while longer.

So what can we do to ensure systems remain maintainable? Unit testing is frequently suggested as an answer, though I've never been a fan of extensive unit testing. If the target platform is very different from the build system, structuring the code to be able to run unit tests is a non-trivial amount of extra work. However I've recently started working in an environment where development testing is strongly encouraged, and I have to admit it does help in keeping code maintainable as developers come and go. The lowest level unit tests are not terribly useful in this regard; even code in a complete mess will have unit tests. On the other hand, a functional testbench for a module where the interfaces to the rest of the system are mocked out is very helpful. You have a much firmer grasp of how changes you make in the module are going to impact the rest of the system. You also have more hope of being able to reimplement the module, as its interfaces are described by the mock framework. If other portions of the system reach in to the internals of the module without using the interfaces... then the cancer has already metastasized and you're probably doomed.

People say that refactoring early and often will keep code maintainable, but that requires agreement from management to spend development time paying off technical debt without an increase in marketable features. In a product environment I rarely win that argument. However, I've now worked on a complete re-implementation of two different systems where the bug load had simply become impossible due to indecipherable engineering. I suppose that is an extreme form of refactoring: extract the best bits of the old system, and throw out the rest.

Do you have tips for keeping code maintainable across multiple generations of products? This site uses Disqus for comments. You can comment anonymously if you wish, or use an existing identity like Twitter, Facebook, or any OpenID provider.

Monday, December 7, 2009

Ruminations on Nickels and Dimes

Nickels & DimesIn 1st grade I could not see how a dime could possibly be worth more than a nickel. The nickel was bigger, after all; it should obviously be worth more.

By 5th grade I realized the dime was more valuable because it was made of a more valuable metal. So even though it was smaller, its total worth was greater than the nickel.

It took until high school to figure out that both the dime and nickel are made of completely worthless metals. The dime is worth more because the US Treasury says it is worth more.

Thursday, December 3, 2009

Memory Matters

PowerPCI once worked on a system where one module was developed outside using Linux/x86 systems, brought in-house, and compiled for Linux/PowerPC. We thought we had been careful in the specifications: avoid endianness assumptions, limit memory footprint, and assume a hefty derating for the slower PowerPC used in the real system. Things looked good in initial testing, but when we started internal dogfooding the PowerPC performance dropped off the proverbial cliff. An operation that took 100 msec on the x86 development system and 300 msec during initial PowerPC testing regressed to an astonishing 45 seconds in the dogfood deployment.

The cause of this disparity was the data cache. For reasons unclear this code iterated through its configuration many, many times. On x86 the various levels of D$ comprise several megabytes, but the PowerPC had only 16K. As the dogfooding progressed and the config grew it resulted in unbelievable cache thrashing and a 2.5 order of magnitude performance drop.

Several years ago Ulrich Drepper wrote an excellent paper about all things related to memory in modern system architectures, especially x86 but relevant everywhere. It is a long read, but very worthwhile. The complete paper is available as a PDF from his site, and it was also serialized in articles on LWN.

  1. Introduction
  2. CPU caches
  3. Virtual memory
  4. NUMA systems - local versus remote references
  5. What programmers can do - cache optimization
  6. What programmers can do - multi-threaded optimizations
  7. Memory performance tools
  8. Future technologies
  9. Appendices and bibliography

I downloaded the PDF and read it over the course of a few weeks. I strongly recommend this paper, the information content is very high.

Tuesday, December 1, 2009

USPS and Red Tape

I recently mailed a package using a Pitney Bowes postage meter. After calculating the postage there is a screen listing restrictions of items which can not be mailed through the US postal system. I suspect most people just click through without reading it, which is a shame. It is a fascinating read, and is reproduced here for your edification and bemusement.

Harmful matter includes, but is not limited to:
a. All types and classes of poisons, including controlled substances.
b. All poisonous animals except scorpions mailed for medical research purposes or for the manufacture of antivenom; all poisonous insects; all poisonous reptiles; and all types of snakes, turtles, and spiders.
c. All disease germs or scabs.
d. All explosives, flammable material, infernal machines, and mechanical, chemical, or other devices or compositions that may ignite or explode.
Hazardous items includes materials such as caustic poisons (acids and alkalies), oxidizers, or highly flammable liquids, gases, or solids; or materials that are likely, under conditions incident to transportation, to cause fires through friction, absorption of moisture, or spontaneous chemical changes or from retained heat from manufacturing or processing, including explosives or containers previously used for shipping high explosives with a liquid ingredient (such as dynamite), ammunition, fireworks, radioactive materials, matches, or articles emitting obnoxious odors.

This is great stuff. I have several observations.

Cube from the movie HellraiserNote the "infernal machines" phrase in section (d). What is an infernal machine? The Pitney-Bowes restrictions appear to come directly from the US Postal Service Domestic Mail Manual C021, but the phrase is not subsequently defined there. Is it something like the puzzle boxes from Hellraiser? I can see why we might not want those to be spread around...

I had no idea the scorpion industry lobby was so powerful. In fact, I had no idea that there was a scorpion industry nor that they had a lobbyist, but they scored their own exception in section (b). Poisonous snakes need not apply, only scorpions can be mailed for medical research purposes.

In the final paragraph it explains that you're not allowed to ship explosives under any circumstances. Also, you need to ensure that containers which you previously used to ship dynamite have been cleaned of any residue. I'm sure how you could possibly have containers which were previously used to ship dynamite, if you're never allowed to ship dynamite.

Finally, in addition to the rules against shipping materials which could kill or maim note that you're not allowed to ship anything smelly or stinky. That would be gross.

Wednesday, November 25, 2009

The Kindle Firmware Hero

KindleAmazon's Kindle software version 2.3 increased battery life from 4 days to 7 days; quite an improvement. Only the Kindle 2(*) model using HSDPA saw this improvement, the Kindle DX uses an EVDO radio and still lists a 4 day battery life.


It seems likely that the Kindle 2 shipped with incomplete radio power management to meet its shipment deadline, and this update represents the completed work. Nonetheless its fun to instead contemplate the moment when some firmware engineer poring over register settings utters a prodigious "WTF!?!" upon finding something completely bogus. A few keystrokes later and voila, huge battery life improvements...


Tuesday, November 24, 2009

Mayor For Life on Foursquare

foursquare.comfoursquare is one of the early entrants in a coming wave of location-based web services. Foursquare catalogs a huge list of venues in 100 cities around the globe: restaurants, movie theaters, museums, bars, etc. You checkin with the service as you visit these places, and the system tells you tips that other foursquare users have suggested about that location. It also (optionally) broadcasts your checkin to your friends, so you can arrange meetups or just learn about new spots by watching their activities. Currently you set up your friend lists on the foursquare web site, though it does provide a way to check whether any of your twitter, facebook, or GMail contacts are using foursquare.


foursquare badgesAn interesting aspect of foursquare is the gaming angle. Badges are awarded for a huge range of activities, for example four checkins in one day earns the "crunked" badge. It looks like a drunk happy face, though in my case no alcohol was involved: Children's Discovery Museum, a local park, Fry's Electronics, and a restaurant. As with stackoverflow, badges provide a way for the developers to reward proper use of the site which doesn't cost them any money.

Finally, there is Mayorship. The person who has checked in to a venue the most in the last 60 days is declared to be the Mayor. You can steal the Mayorship away from its current holder by visiting more often, which gives the site a competitive feeling. Apparently the competition for Mayorship of hot nightspots is intense, complete with accusations of cheating. An old saying about academia springs to mind: "On foursquare, tempers run high because the stakes are so small." Nonetheless, the Children's Discovery Museum Mayorship is mine. Don't even think about trying to take it.

foursquare mayor of the Childrens Discovery Museum

A small number of business owners offer rewards to their foursquare mayor, typically on the order of a free drink. This hints at a route foursquare can take to monetize the site, by allowing businesses to reach out to patrons. The challenge will be to do this in a way that isn't creepy: a leaderboard to see how close I am to becoming Mayor would be fine, actively bugging me to visit more often would not be.


About SMS...

SonyEricsson T616The best experience using the service is with a GPS-enabled smartphone. There are free apps available for iPhone and Android, and there is a mobile-optimized website for phones with a reasonable browser. Finally, there is SMS. As I still use an ancient DumbPhone, I use SMS. One of these years, I'll buy a new phone.

foursquare is clearly aimed at people with better phones. You have to type the venue name exactly, there is no fuzzy matching. If your checkin is not recognized, there is no way to correct it after the fact on the foursquare website. This can be very frustrating. Fred Wilson wrote about the importance of including SMS support in mobile apps, both to allow someone to try the service without having to install an app and to have an answer for the entire market. Certainly in my case, I wouldn't otherwise be able to use it.

Monday, November 23, 2009

Quackulum-310

Scientists today announced the creation of a new isotope in the "island of stability" beyond Bismuth in the periodic table. It has been christened Quackulum, owing to the somewhat odd arrangement of its nucleus.

Rubber duck surrounded by electron paths

Tuesday, November 17, 2009

24.855134809027 Days

There have been issues with the autofocus on the Motorola Droid phone, which suddenly resolved themselves this morning and led to speculation of a stealth update. There is a fascinating comment in the Engadget forums by Dan Morrill (and noted in a tweet from Matt Cutts):

There's a rounding-error bug in the camera driver's autofocus routine (which uses a timestamp) that causes autofocus to behave poorly on a 24.5-day cycle. That is, it'll work for 24.5 days, then have poor performance for 24.5 days, then work again.

I suspect it is exactly 24 days, 20 hours, 31 minutes, 23 seconds, and 647 milliseconds, the amount of time for a millisecond quantity to overflow a signed 32 bit integer. This is a relatively common programming error, and one which can slip through a compressed QA schedule. In the case of the Droid, the camera was working fine while the QA team tested it and then stopped working slightly after the product shipped.

Motorola Droid

Monday, November 16, 2009

The Point of the Exercise

Spam email with no attachment
Setting up a phishing site:$25
Hiring a botnet to deliver spam:$0.0008/recipient
Forgetting to attach the malware:priceless

Thursday, November 12, 2009

Cavium Buys Montavista

A bit of news got buried by other massive acquisitions this week: Cavium Networks acquired MontaVista Software for $50 million. The offer was comprised of $16 million in cash plus $34 million in stock. It has been reported that MontaVista raised somewhere between $90 million to over $100 million from investors, but browsing the SEC Edgar Database shows $68 million. As I have no idea what I'm doing, its possible I simply missed another $20-30 million in fundraising which isn't so easily discoverable via Edgar. In particular a $3 million C round is awfully small, but that is what the paperwork shows.

  • A round: $31 million from USVP, Alloy Ventures, and James Ready (the founder) closed 5/2002. From the amendments it looks like Alloy put in $5 million of that.
  • B round: $9 million from existing investors, closed 4/2004
  • C round: $3 million from existing investors, closed 1/2005
  • D round: $21 million closed 12/2006, with Siemens Venture Capital joining as a new investor
  • also $2.7 million in 8/2009 and another $1 million in 10/2009, presumably lifeline funding leading up to the Cavium acquisition.

Fistful of DollarsWhy would investors agree to sell the company for $50 million? Presumably, they're just accepting reality. Software support businesses rarely attract venture capital, but Linux was a major buzzword for investors earlier in the decade. The trouble with support as a business model is that expenses grow linearly with revenue: as you add customers, you have to grow headcount to handle them. Expenses for a product company grow at a far slower rate, one can increase sales by 2x while increasing expenses by less than 2x.

So far as I can tell adoption of Linux in the embedded space is still growing robustly, displacing commercial RTOSes. The economic benefit of avoiding a per-unit software royalty is compelling. The expertise to bring up Linux on a new board is quite common now, companies can beef up their own teams rather than pay for support from MontaVista or Wind River.


Update: In the comments teich points out Business Review Online shows a somewhat different funding schedule:

MontaVista   9.0   Series A
MontaVista  23.0   Series B
MontaVista  28.0   Series C
MontaVista  12.0   Series D
MontaVista   3.0   individual investment
MontaVista  21.0   Series E

After the $21 million round, MontaVista appears to have taken in another $3.7 million. Altogether this matches the $100 million quotes elsewhere, though I've no idea why some of these funding events are not in Edgar.