Coding Relic: filesystem

Showing posts with label filesystem. Show all posts

Tuesday, January 31, 2012

certdata.txt

When building the software for a new device there are a ton of things which need to go into the filesystem: binaries, libraries, device nodes, a bunch of configuration files, etc. One of the chunks of essential data is a list of trusted certificate authorities for libssl, commonly stored in /etc/ssl/ca_certificates.crt. Its common practice to grab a list of certificates from mozilla.org, and there are various Perl and Python scripts floating around in the search engines to assemble this list into a PEM file suitable for libssl.

2011 was not a good year for certificate authorities. DigiNotar was seized by the Dutch government after it became clear they had been thoroughly breached and generated fraudulent certificates for many large domains. Several Comodo resellers were similarly compromised and generated bogus certs for some of the same sites. Browser makers responded by encoding a list of toxic certificates into the browser, to reject any certificate signed by them.

Encoding a list of toxic certificates is the key phrase in that paragraph. As of 2011, Mozilla's certdata.txt contains both trusted CAs and absolutely untrustworthy, revoked CAs. There is metadata in the entry describing how it should be treated, but several of the scripts floating around grab everything listed in certdata.txt and put it in the PEM file. This is disastrous.

The code to search for is CKT_NSS_NOT_TRUSTED. The utility you are using should check for that type, and skip it. If there is no handling for CKT_NSS_NOT_TRUSTED, then the utility you are using is absolutely broken. Don't use it. I know of at least two which handle this correctly:

Adam Langley's extract-nss-root-certs, written in Go. Read his announcement for more information.
OpenSUSE's extractcerts.pl, written in Perl

Thursday, July 16, 2009

Courgette binary patch compression

Recently on the Chromium blog Google announced an improved binary compression algorithm called Courgette. In the example cited Courgette produced a patch that was only 11% of the size of that produced by bsdiff. The design overview has more details on its operation:

Courgette uses a primitive disassembler to find the internal pointers. The disassembler splits the program into three parts: a list of the internal pointer's target addresses, all the other bytes, and an 'instruction' sequence that determines how the plain bytes and the pointers need to be interleaved and adjusted to get back the original input. We call this an 'assembly language' because we can run an 'assembler' to process the instructions and emit a sequence of bytes to recover the original file.

The non-pointer part is about 80% of the size of the original program, and because it does not have any pointers mixed in, it tends to be well behaved, having a diff size that is in line with the changes in the source code. Simply converting the program into the assembly language form makes the diff produced by bsdiff about 30% smaller.

I haven't checked, but I suspect its disassembler supports x86 only. Chromium runs on Windows, MacOS X, and Linux, which all run primarily on x86 systems.

Courgette is of course aimed at updates to Google's Chrome browser, which is installed in very large numbers and frequently updated. Reducing the size of the updates results in a better user experience. Nifty.

Incremental Patching and Embedded Systems

When first posted, this article launched directly from Courgette into a discussion of incremental patching in embedded systems. In the comments Wayne Scott pointed out that this really wasn't fair: Courgette is purely a way to make binary patches smaller. In fact because Courgette requires the complete original binary in order to generate its diffs, it cannot be used to generate independent incremental patches at all. After clarifying my thinking, I've updated the post and added a bit of segue text.

Let us now turn to the more general subject of patching of embedded systems. Whenever there is a problem in the field, there is a strong temptation to push out a fix as rapidly as possible. Whether called a point patch or a hotfix, the basic idea is to patch just the portion of the software causing issues for that customer. Larger, periodic maintenance releases collect all existing hotfixes (plus additional ongoing maintenance work) into a single release suitable for all deployments. For embedded systems, on general principles I don't favor the use of hotfixes. Though it reduces the bandwidth required for updates, I feel the disadvantages outweigh the advantages:

perhaps obviously, you need management software to apply, remove, and list installed patches
tech support has a larger number of variations to deal with
test load increases rather dramatically. If you have 5 independent patches you may need to test the combinations, up to 2^5=32 variations to test, not just 5.
Frequent updates are not a good thing for most embedded systems. Customers want the gear to fade into the background and just work, making them update and reboot too often becomes a distinct negative.
As described in an earlier article I favor storing the boot images in a raw flash partition, not any sort of filesystem, which would make installation of an incremental patch more complex.

I recommend not trying to maintain the most recent maintenance release plus an ever-growing collection of hotfixes. I suggest instead to revise the maintenance release whenever there is a cutomer problem. If other customers are not experiencing the problem then they need not deploy the new release right away. The main benefit is to avoid having 2^N possible combinations of patches in the field, instead having only N minor maintenance releases. Revving the maintenance release also tends to be treated with more care than a simple hotfix is; rushing the process is rarely beneficial.

Wednesday, June 25, 2008

More Random Musings on Embedded Filesystems

The first set of random musings ended up being far longer than planned, and rather a lot of material ended up on the cutting room floor. Some of it has been swept into this article.

Incremental Patches

The Gentle Reader may have noticed that using a read-only file system like squashfs has an important repercussion: it is impossible to copy in a modified file as an incremental patch. No matter how small the change, delivering a bug fix means delivering a complete new filesystem. Realistically, this means a complete new software image.

Personally I don't consider this to be a drawback for an embedded system. Microsoft, Apple, and most Linux distributions offer incremental patches because it suits their needs: when your software image is gigabytes in size and you have gazillions of users, it is important to deliver fixes in a bandwidth-efficient way. Embedded software is less affected by this problem, as it tends to be much smaller and have fewer downloads (either because the product sells in low volume, or has a relatively low fraction of users who will ever apply the updates).

Additionally, incremental patches come with their own set of costs:

perhaps obviously, you need management software to apply, remove, and list installed patches
tech support has a larger number of variations to deal with
test load increases rather dramatically: if you have 5 independent patches you may need to test the combinations: up to 2^5=32 variations to test, not just 5.

Again, for companies the size of Microsoft/Apple/RedHat these issues can be handled, but for a small company it can greatly add to the support and QA load for little real benefit.

So personally, I favor not delivering incremental patches. For each major release of the product, maintain a patch branch containing all accumulated fixes to date. When a new customer issue comes in, the fix is added to the tip of that branch and a complete new build released with an incremented version number. In exceptional cases (an issue for one specific customer which may be deleterious for other customers) a new branch can be made and a private image delivered, but for your own sanity you want to make this an infrequent occurrence.

General purpose OS vendors also use incremental patches to deliver fixes for specific problems without annoying their customers with superfluous changes or breakage. This is an issue for embedded vendors as well: there is no customer quite so pissed off as one where a software patch made things worse. However in the end I think this boils down to being disciplined about what changes are allowed in a maintenance release, and what needs to be pushed out to the next major release.

How to Implement Incremental Patches

If you really want to deliver incremental patches, it can still be done using the filesystem choices recommended here. The key observation is that even though there is a file in the filesystem, you don't have to use it. If a newer binary is present elsewhere, you can use it instead.

To implement this scheme it is easiest to segregate the locally developed applications into their own directory, for example /mycompany/bin instead of directly in /bin. For this example I'll assume that incremental patch binaries will be installed in a flash filesystem in /flash/mycompany. When assembling the squashfs/cramfs/etc for a release, place the application binaries in /ro/mycompany/bin and make /mycompany be a softlink to the writeable ramdisk:

# ls -l /
drwxr-xr-x    1 root     root          472 Jun  6 11:55 bin
drwxr-xr-x    1 root     root          836 Jun  6 11:55 dev
lrwxrwxrwx    1 root     root            7 Jun  6 11:55 etc -> /rw/etc
drwxr-xr-x    1 root     root          555 Jun  6 11:55 lib
lrwxrwxrwx    1 root     root            9 Jun  6 11:55 mycompany -> /rw/mycompany
drwxr-xr-x    1 root     root           12 Jun  6 11:55 ro
drwxrwxrwt    9 root     root          180 Jun  6 12:17 rw
lrwxrwxrwx    1 root     root            7 Jun  6 11:55 tmp -> /rw/tmp
lrwxrwxrwx    1 root     root            7 Jun  6 11:55 var -> /rw/var

At boot, you would use a script to populate softlinks in the /mycompany directory structure to point to each binary in /ro/mycompany. If an updated version of the application binary is available in /flash, the softlink will be updated to point to it instead.

# First populate links to the release binaries
cd /ro
for dir in `find mycompany -type d -noleaf`; do
    mkdir -p /$dir
done
for file in `find mycompany -not -type d -noleaf`; do
    ln -s -f /ro/$file /$file
done
        
# Now replace the softlinks for binaries which we've incrementally patched
cd /flash
for dir in `find mycompany -type d -noleaf`; do
    mkdir -p /$dir
done
for file in `find mycompany -not -type d -noleaf`; do
    ln -s -f /flash/$file /$file
done

What you'll end up with is a series of softlinks:

# ls -l /mycompany/bin/
lrwxrwxrwx 1 root  root 22 Jun 24 16:49 appNum1 -> /ro/mycompany/bin/appNum1
lrwxrwxrwx 1 root  root 22 Jun 24 16:49 appNum2 -> /ro/mycompany/bin/appNum2
lrwxrwxrwx 1 root  root 25 Jun 24 16:49 appNum3 -> /flash/mycompany/bin/appNum3

Flash filesystems

These two articles have focussed on in-memory filesystems. If the total size of the libraries and binaries is reasonably small an embedded system can get by with just the in-memory filesystem, but at some point it makes more sense to store the application binaries in a flash filesystem to be paged into memory as needed. The Gentle Reader might be interested in an overview of flash filesystem technology at the IBM developerWorks site, written by M. Tim Jones.

For now, I'll leave it at that.

Saturday, May 31, 2008

Random Musings on Embedded Filesystems

Getting the kernel to successfully boot on a new board is a wonderful feeling, which lasts about 5 seconds before the application developers begin clamoring for time on the box. For the apps work to make progress the application binaries have to be accessible in a filesystem somewhere. During platform bringup it is common to use NFS because it will "just work" almost immediately after you get the platform to boot.

I'd advise not relying on NFS for very long: using a server with gigabytes of disk means you're completely insulated from size constraints. You may find that the application footprint grows astronomically simply because nobody notices the pain. Adding library dependencies which pull in 20 Megs of DLLs is invisible on an NFS server, but a crisis when you discover very late in the process that it won't fit in flash.

Lets start by listing a few assumptions about the target:

it does not have a hard disk (i.e. a spinning, magnetic disk)
it has at least as much DRAM as flash, probably more.
it uses a 32 or 64 bit processor with an MMU (PowerPC, MIPS, ARM, x86, etc)

If your platform is significantly different from these assumptions, the recommendations in this article may not be appropriate. The biggest one is the hard disk: if your system has one, don't use these recommendations - your filesystem choices should revolve around what the disk can do for you. If your system does not have an MMU some of these recommendations may apply, but you need to look at using uCLinux and its associated best practices.

A Note about the Buffer Cache

The Linux I/O system is designed for a general purpose computer, with a disk connected to an I/O interface. Recently accessed files are stored in memory in the buffer cache; if the files are accessed again they can be provided from cache instead of hitting the disk a second time. You may also come across references to a page cache; in current kernel versions the page cache and buffer cache have been merged into a single pool of memory.

Embedded systems generally don't have a disk, their filesystems reside in RAM. Nonetheless Linux still relies on the buffer cache. So though a file resides in a RAMdisk it will be read into the buffer cache, and then to the process which requested it. The buffer cache doesn't add as much value in this setup (re-reading the filesystem would be quick), but the buffer cache is essential to support Linux APIs like mmap(). The effect of the buffer cache will be described in a bit more detail later in the article.

Option #1: Squashfs and tmpfs

SquashFS is a compressed, read-only filesystem. A squashfs filesystem is constructed offline, compressing a directory structure into a binary blob which can be loaded into memory and mounted. As blocks are read from the squashfs they are decompressed into the buffer cache; if memory becomes tight the cache can toss pages out and re-read them from the squashfs as needed. Thus the memory footprint is fairly well minimized, only files in active use will be decompressed.

Squashfs is inherently read-only. It is not mounted read-only, such that you could remount it read-write. The squashfs filesystem is assembled at compile time; once created it is immutable. This is generally acceptable for filesystems which shouldn't change anyway, like /bin or /lib, but is clearly not suitable for everything. There are configuration files like /etc/resolv.conf and /etc/timezone which probably need to be occasionally written, and numerous scratch files reside in /tmp. Therefore a second filesystem is needed, a writeable ramdisk.

tmpfs is a ramdisk, and is a good way to provide the writeable portions of the filesystem. Daniel Robbins wrote an excellent article about tmpfs, published at IBM DeveloperWorks a few years ago. Squashfs would be used for the root filesystem mounted on /, while the tmpfs would be mounted on a subdirectory like /rw. /etc, /var and /tmp are most easily provided as soft links to the tmpfs. For example:

# ls  -l /
dr-xr-xr-x    1 root     472 May 16 10:10 bin
dr-xr-xr-x    1 root     836 May 16 10:10 dev
lrwxrwxrwx    1 root       7 May 16 10:10 etc -> /rw/etc
dr-xr-xr-x    1 root     555 May 16 10:10 lib
dr-xr-xr-x   97 root       0 Dec 31  1969 proc
lrwxrwxrwx    1 root       8 May 16 10:10 root -> /rw/root
drwxrwxrwt    9 root     180 May 16 10:12 rw
dr-xr-xr-x    1 root     151 May 16 10:10 sbin
lrwxrwxrwx    1 root       7 May 16 10:10 tmp -> /rw/tmp
dr-xr-xr-x    1 root      23 May 16 10:09 usr
lrwxrwxrwx    1 root       7 May 16 10:10 var -> /rw/var

At boot, the init script would mount the tmpfs filesystem and create the directories which the softlinks point to. You can specify a maximum size when mounting the tmpfs, 8 megabytes in this example:

if mount -t tmpfs -o size=8m tmpfs /rw >/dev/null 2>&1; then
    mkdir /rw/tmp /rw/var /rw/etc /rw/root
else
    # Halt and catch fire
fi

Squashfs has not been accepted into the mainline kernel, so you will need to download patches from the project website. There is also a LZMA-enhanced version of Squashfs, though I have not personally used this version. LZMA appears to obtain better compression ratios than the zlib which Squashfs normally uses.

For the root directory ("/"), the kernel has to be able to recognize and mount the filesystem without relying on any external mount program. This is a chicken and egg problem: there is no way to run a mount program until there is a filesystem mounted. To use squashfs for the root filesystem, linux/init/do_mounts.c must be modified to recognize its magic number. A patch is available, if your kernel source does not already have this handling. The patch also handles a cramfs root filesystem.

Option #2: cramfs instead of squashfs

cramfs is an older compressed, read-only filesystem for Linux, developed by Linus Torvalds. Cramfs works quite well and is included in the kernel sources. However squashfs typically achieves better compression, because it uses larger blocks in zlib. In one project I worked on a 20 Meg cramfs filesystem became about 18 Megs with squashfs.

Like squashfs, cramfs must be paired with a second, writable ramdisk to be useful.

Option #3: ext2 instead of tmpfs

ext2 can be use to provide a writeable ramdisk, though there are relatively few reasons to use it for this purpose as opposed to tmpfs. Older tmpfs releases did not support certain filesystem features like sendfile(), but this is not an issue for most applications.

Nonetheless if there is a reason to do so, an ext2 filesystem can be created in a /dev/ram# device and mounted as a ramdisk. ext2 has to be formatted before it can be used, which would normally mean bundling mke2fs into your target image. However there is another way to create an ext2 which is generally smaller than mke2fs. Empty, zero filled ext2 filesystems compress extremely well using bzip2. You can create a filesystem of the appropriate size while compiling the target image, by running commands on your build system:

dd if=/dev/zero of=/tmp/e2file bs=1024 count=8192
sudo losetup /dev/loop7 /tmp/e2file
sudo mke2fs -b 4096 -m 1 -i 16384 /dev/loop7
sudo tune2fs -c -1 /dev/loop7
sudo losetup -d /dev/loop7
bzip2 -9 /tmp/e2file

The bs and count arguments to dd specify the size of file to create, filled with zeros. We use a /dev/loop device to create a new ext2 filesystem in this file, and then compress it. The result should be a couple hundred bytes in size, far smaller than mke2fs would be. The e2file.bz2 is copied into the target filesystem, and mounted by the boot scripts like so:

bunzip2 -c /e2file.bz2 >/dev/ram2 2>/dev/null
if mount -t ext2 /dev/ram2 /rw >/dev/null 2>&1; then
    mkdir /rw/tmp /rw/var /rw/etc /rw/root
else
    # Halt and catch fire
fi

Option #4: JFFS2 instead of tmpfs

In the discussion of the previous alternatives the read-only portion of the filesystem would be compressed in memory using squashfs or cramfs, but the writable portion would be stored uncompressed. If your system requires a large amount of ramdisk space but your memory is constrained, the natural solution is to look for a way to compress it.

If the applications generating this data can be modified to use zlib and compress their own output, that is probably the best way to proceed. If you cannot modify the apps, there are a couple ways to get compression at the filesystem layer. The only natively compressed, writable filesystem for Linux I know of is JFFS2, which is not designed to be a ramdisk but can be pressed into service if necessary using the mtdram.o module (which exists to ease debugging of MTD applications). The vast majority of the complexity in JFFS2, for handling erase blocks and wear leveling and all of the other nuances of flash chips, is wasted when used as a ramdisk, but it does provide compression. The boot scripts would work like so:

/sbin/insmod -q /lib/modules/mtdram.o total_size=8192 erase_size=16
if mount -n -t jffs2 -o umask=0666 /dev/mtdblocka /rw >/dev/null 2>&1; then
    mkdir /rw/tmp /rw/var /rw/etc /rw/root
else
    # Halt and catch fire
fi

JFFS2 consumes a great deal more CPU time than other filesystems, due to the compression. The throughput to a compressed ramdisk will be relatively low. On one platform I worked on a tmpfs ramdisk could handle writes at about 40 MBytes/sec while the JFFS2 ramdisk managed only 1 MByte/sec.

Note that this technique does not save as much memory as you might think. Whenever blocks are accessed they are decompressed by JFFS2 into the buffer cache, then copied up to the application. Likewise written blocks are held uncompressed in the buffer cache. If your system touches a large amount of data in the ramdisk, the footprint of the buffer cache will become more significant than the backing filesystem in JFFS2. This is another advantage of modifying applications to use zlib: the data remains compressed in the buffer cache, and is only decompressed within the application itself.

There are other ways to implement a compressed, writable filesystem, but I have not used them myself and can't add anything pithy about them. Some links:

cloop is a compressed block loopback device, allowing any filesystem to be mounted atop it
e2compr patches compression support into ext2/ext3. Development was moribund for a long time, it is not clear to me how stable these patches are.

Option #5: ext2 for everything

In large measure ext2 and the related ext3 are the "default" filesystem for Linux. When building a Linux desktop or server ext[23] is a reasonable solution, and the one generally chosen barring good reason to do otherwise. This notion of ext2 as the default often carries over into embedded systems as well, though there is far less advantage to doing so. It is certainly possible to create a large ext2 ramdisk (using the /dev/loop technique shown above) and use it to store both the application binaries as well as provide scratch space for /tmp et al. This does have the appeal of requiring only one filesystem, rather than the combinations recommended earlier.

The memory footprint of an ext2 ramdisk is always going to be larger than the options described above. It is common practice to create the ext2 filesystem in a file, gzip it, and compile the gzipped binary into the target image. This reduces the image size, but at boot the kernel will decompress the entire filesystem into memory. So if you have a 40 Meg ext2 which gzips down to 10 Megs, it will only add 10 Megs to the image size but expand to 40 Megs of RAM on the target. Compare this to squashfs, where a 10 Meg filesystem adds 10 Megs to the image size and also consumes 10 Megs of RAM on the target. The buffer cache does not perturb the result: when using either ext2 or squashfs, any files in active use will be present in the buffer cache and the footprint would be the same in both cases.

Postscript

This turned into an exceedingly long post. Early drafts were even longer, and much material got left on the cutting room floor. I'll collect more random musings on embedded filesystems into a future post.

Updates

A comment from Lance pointed to a discussion on LKML about extending cramfs to be writable. Changes made to files would be stored in the page cache, and as far as I can tell would be stored in uncompressed form. The same technique could presumably be applied to squashfs. This would make things a bit simpler for those who need a small amount of writable ramdisk.

Tom Ziomek notes that SquashFS has been accepted into the mainline kernel in 2.6.29. Scroll down to see Tom's comment; since this article appeared this blog switched to using Disqus, so you see the older comments first followed by the recent comments in a separate Disqus section.