Coding Relic: July 2009

Wednesday, July 29, 2009

Embedded Linux Market Share

Last week Wind River, now a subsidiary of Intel, announced that it had taken the lead in the share of embedded Linux revenue.

ALAMEDA, CA - July 22, 2009 - Wind River, a wholly owned subsidiary of Intel Corporation, today announced it has been named the embedded Linux market leader by VDC Research Group. Released today in VDC's 2009 Linux in the Embedded Systems Market report, Wind River achieved the market share lead in 2008 with greater than 30 percent of total market revenue, more than seven percentage points over the next closest competitor. Wind River entered the Linux business in 2004 to complement its market-leading, proprietary operating system, VxWorks.

The unnamed "next closest competitor" is MontaVista Software, but the real competition is not between the different embedded software vendors. The real competition is between any commercial Linux vendor versus rolling your own distribution from source. The various kernel distributions for PowerPC and MIPS are easy to download and cross-compile. Assembling a filesystem is not difficult, as discussed in an earlier article on this site. Add busybox and either glibc or uClibc, and you are most of the way to a bootable system.

There are a few areas where embedded Linux vendors provide value, and why I generally advocate purchasing support from them for a Linux development project:

The compiler toolchain: Maintaining a cross-compiling toolchain is quite a bit of work. Most importantly, one has to stay on top of CPU bugs. All CPUs ship with bugs, even x86, but a dirty little secret of the RISC SoC business is that they can go to market with more significant problems than Intel or AMD could get away with. So long as the issue can be worked around in the compiler or assembler - by not emitting the problematic sequence of instructions - the chip will ship anyway and rectify problems in later spins. The bugs will be documented in the Errata, but the descriptions are made to sound quite innocuous. CPU developers make sure that the major commercial embedded Linux vendors have the needed workarounds in their toolchain.
The kernel development tax: if you look at the version control logs for Linux/MIPS or Linux/PowerPC, the people doing the heavy lifting are often employed at one of the Linux vendors. Those companies have an economic reason to pay for that development. Unfortunately this leads to a variation of the Prisoner's Dilemma: somebody has to fund it. One can either pay for a support contract in order to fund Linux development, or not pay but hope that enough other people do.
Proprietary tools: The package management tools and customized Eclipse IDE supplied by these vendors are generally not useful to me, but some of their supplementary tools for profiling or shared library size reduction are quite interesting.

It is sometimes galling to be paying for support for embedded Linux, particularly because the technical support for specific problems has never actually resolved anything for me. Nonetheless I do advocate having at least a minimal contract, using the supplied compiler toolchain and investigating the other tools they provide. It is worth spending some resources for.

(Original press release via Linux For Devices)

Monday, July 20, 2009

Microsoft Releases Linux Paravirtualization Driver Source

From the Microsoft press release:

REDMOND, Wash., July 20, 2009 - Today, in a break from the ordinary, Microsoft released 20,000 lines of device driver code to the Linux community. The code, which includes three Linux device drivers, has been submitted to the Linux kernel community for inclusion in the Linux tree. The drivers will be available to the Linux community and customers alike, and will enhance the performance of the Linux operating system when virtualized on Windows Server 2008 Hyper-V or Windows Server 2008 R2 Hyper-V.

Microsoft wants Hyper-V to compete with VMWare in all markets, and to do this it needs to have good support for virtualizing Linux. Microsoft very pragmatically decided that closed source paravirtualization drivers for Linux had no chance of success. ~~They'd get a press release out of such a move, but no significant adoption without Red Hat/Canonical/etc pulling the drivers in. Opening the source is in their best interests.~~

The last two posts have been an experiment of sorts. Prior posts had been written entirely from scratch on a technical topic. They take a long, long time to write. I wanted to try posting more frequently by adding a few thoughts to a relevant news item, but thus far I haven't been happy with the results. In this article I said opening the source would allow a Linux platform vendor to include the Microsoft paravirtualization drivers, but on further reflection it seems unlikely that they would actually do so. Red Hat has their own virtualization strategy which doesn't include Microsoft, and there is no reason to believe Canonical would be interested in being an enabler for sales of Microsoft Hyper-V.

Thursday, July 16, 2009

Courgette binary patch compression

Recently on the Chromium blog Google announced an improved binary compression algorithm called Courgette. In the example cited Courgette produced a patch that was only 11% of the size of that produced by bsdiff. The design overview has more details on its operation:

Courgette uses a primitive disassembler to find the internal pointers. The disassembler splits the program into three parts: a list of the internal pointer's target addresses, all the other bytes, and an 'instruction' sequence that determines how the plain bytes and the pointers need to be interleaved and adjusted to get back the original input. We call this an 'assembly language' because we can run an 'assembler' to process the instructions and emit a sequence of bytes to recover the original file.

The non-pointer part is about 80% of the size of the original program, and because it does not have any pointers mixed in, it tends to be well behaved, having a diff size that is in line with the changes in the source code. Simply converting the program into the assembly language form makes the diff produced by bsdiff about 30% smaller.

I haven't checked, but I suspect its disassembler supports x86 only. Chromium runs on Windows, MacOS X, and Linux, which all run primarily on x86 systems.

Courgette is of course aimed at updates to Google's Chrome browser, which is installed in very large numbers and frequently updated. Reducing the size of the updates results in a better user experience. Nifty.

Incremental Patching and Embedded Systems

When first posted, this article launched directly from Courgette into a discussion of incremental patching in embedded systems. In the comments Wayne Scott pointed out that this really wasn't fair: Courgette is purely a way to make binary patches smaller. In fact because Courgette requires the complete original binary in order to generate its diffs, it cannot be used to generate independent incremental patches at all. After clarifying my thinking, I've updated the post and added a bit of segue text.

Let us now turn to the more general subject of patching of embedded systems. Whenever there is a problem in the field, there is a strong temptation to push out a fix as rapidly as possible. Whether called a point patch or a hotfix, the basic idea is to patch just the portion of the software causing issues for that customer. Larger, periodic maintenance releases collect all existing hotfixes (plus additional ongoing maintenance work) into a single release suitable for all deployments. For embedded systems, on general principles I don't favor the use of hotfixes. Though it reduces the bandwidth required for updates, I feel the disadvantages outweigh the advantages:

perhaps obviously, you need management software to apply, remove, and list installed patches
tech support has a larger number of variations to deal with
test load increases rather dramatically. If you have 5 independent patches you may need to test the combinations, up to 2^5=32 variations to test, not just 5.
Frequent updates are not a good thing for most embedded systems. Customers want the gear to fade into the background and just work, making them update and reboot too often becomes a distinct negative.
As described in an earlier article I favor storing the boot images in a raw flash partition, not any sort of filesystem, which would make installation of an incremental patch more complex.

I recommend not trying to maintain the most recent maintenance release plus an ever-growing collection of hotfixes. I suggest instead to revise the maintenance release whenever there is a cutomer problem. If other customers are not experiencing the problem then they need not deploy the new release right away. The main benefit is to avoid having 2^N possible combinations of patches in the field, instead having only N minor maintenance releases. Revving the maintenance release also tends to be treated with more care than a simple hotfix is; rushing the process is rarely beneficial.

Tuesday, July 14, 2009

DRY and the DMV

The Pragmatic Programmer is one of the best books available concerning the development of quality software. It is structured as a series of tips, with illustrative examples and the occasional horror story. One of the first tips is the DRY principle:

DRY - Don't Repeat Yourself
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

DRY is often misinterpreted to mean simply that code should not be duplicated, but it is somewhat more subtle: don't duplicate state. If you have multiple different places in the code which keep state about an aspect of the system, and all places have to have the same content at all times for the system to work properly, then you have made maintenance of the system harder than it needs to be. You'll have to debug cases where the representations fall out of sync, and all such places must be updated at the same time when code changes are made. The Pragmatic Programmers extend the DRY principle outside of the code itself to include database schemes, documentation, and build systems. Everything should have one authoritative source.

This brings us to the Department of Motor Vehicles, though the Gentle Reader might at first not see the connection. I received a form in the mail to renew my driver's license, which I promptly signed and sent back. The new license arrived in due course, and things were fine until a few weeks later when I noticed the address was incorrect. The old license is correct, the new one is wrong.

I've no idea whether the address was correct on the renewal form, I did not check it. Apparently I should have, but I didn't bother - it hadn't changed. At some stage of the renewal process, a single digit was altered in a subtle way.

Why was it even possible for the address to be changed in the renewal process? Here we can only speculate. The DMV does need a procedure to update an address as part of a license renewal, because sometimes people supply a new one on the form. I'll speculate that the DMV, either via OCR or manual typing, re-enters the address in all cases and not just if the form supplied a change. This procedure depends on the original address to be faithfully reproduced in cases where it wasn't supposed to change. In my case, either due to OCR glitch or typing error, a digit changed resulting in the new license being printed with an incorrect address.

I believe this is an example of the consequences of a violation of the DRY principle. The same state - my address - exists in two places: in the DMV database and on the form. Those two pieces of state are supposed to be the same, indeed must be the same for the process to work correctly, but errors can easily occur which allow their contents to get out of sync.

A corollary lesson in this situation: if state isn't supposed to change, don't change it. If the form does not indicate a change of address, the authoritative state is in the database and the form contents should be ignored.

Aftereffects

I've already received a jury summons at the incorrect address, which the post office helpfully delivered to me anyway. Even after correcting the address I suspect I will receive a summons twice as often from now on. That will form the basis of a future blog post to illustrate the importance of duplicate suppression in databases, I suppose.

I can change my address back by submitting a form to the DMV, but issuing a new license with the correct address will be at my own cost. This is part of the price of modern life, I suppose.