Wednesday, July 29, 2009

Embedded Linux Market Share

Last week Wind River, now a subsidiary of Intel, announced that it had taken the lead in the share of embedded Linux revenue.

ALAMEDA, CA - July 22, 2009 - Wind River, a wholly owned subsidiary of Intel Corporation, today announced it has been named the embedded Linux market leader by VDC Research Group. Released today in VDC's 2009 Linux in the Embedded Systems Market report, Wind River achieved the market share lead in 2008 with greater than 30 percent of total market revenue, more than seven percentage points over the next closest competitor. Wind River entered the Linux business in 2004 to complement its market-leading, proprietary operating system, VxWorks.

The unnamed "next closest competitor" is MontaVista Software, but the real competition is not between the different embedded software vendors. The real competition is between any commercial Linux vendor versus rolling your own distribution from source. The various kernel distributions for PowerPC and MIPS are easy to download and cross-compile. Assembling a filesystem is not difficult, as discussed in an earlier article on this site. Add busybox and either glibc or uClibc, and you are most of the way to a bootable system.

There are a few areas where embedded Linux vendors provide value, and why I generally advocate purchasing support from them for a Linux development project:

  • The compiler toolchain: Maintaining a cross-compiling toolchain is quite a bit of work. Most importantly, one has to stay on top of CPU bugs. All CPUs ship with bugs, even x86, but a dirty little secret of the RISC SoC business is that they can go to market with more significant problems than Intel or AMD could get away with. So long as the issue can be worked around in the compiler or assembler - by not emitting the problematic sequence of instructions - the chip will ship anyway and rectify problems in later spins. The bugs will be documented in the Errata, but the descriptions are made to sound quite innocuous. CPU developers make sure that the major commercial embedded Linux vendors have the needed workarounds in their toolchain.
  • The kernel development tax: if you look at the version control logs for Linux/MIPS or Linux/PowerPC, the people doing the heavy lifting are often employed at one of the Linux vendors. Those companies have an economic reason to pay for that development. Unfortunately this leads to a variation of the Prisoner's Dilemma: somebody has to fund it. One can either pay for a support contract in order to fund Linux development, or not pay but hope that enough other people do.
  • Proprietary tools: The package management tools and customized Eclipse IDE supplied by these vendors are generally not useful to me, but some of their supplementary tools for profiling or shared library size reduction are quite interesting.

It is sometimes galling to be paying for support for embedded Linux, particularly because the technical support for specific problems has never actually resolved anything for me. Nonetheless I do advocate having at least a minimal contract, using the supplied compiler toolchain and investigating the other tools they provide. It is worth spending some resources for.

(Original press release via Linux For Devices)

Monday, July 20, 2009

Microsoft Releases Linux Paravirtualization Driver Source

From the Microsoft press release:

REDMOND, Wash., July 20, 2009 - Today, in a break from the ordinary, Microsoft released 20,000 lines of device driver code to the Linux community. The code, which includes three Linux device drivers, has been submitted to the Linux kernel community for inclusion in the Linux tree. The drivers will be available to the Linux community and customers alike, and will enhance the performance of the Linux operating system when virtualized on Windows Server 2008 Hyper-V or Windows Server 2008 R2 Hyper-V.

Microsoft wants Hyper-V to compete with VMWare in all markets, and to do this it needs to have good support for virtualizing Linux. Microsoft very pragmatically decided that closed source paravirtualization drivers for Linux had no chance of success. They'd get a press release out of such a move, but no significant adoption without Red Hat/Canonical/etc pulling the drivers in. Opening the source is in their best interests.

The last two posts have been an experiment of sorts. Prior posts had been written entirely from scratch on a technical topic. They take a long, long time to write. I wanted to try posting more frequently by adding a few thoughts to a relevant news item, but thus far I haven't been happy with the results. In this article I said opening the source would allow a Linux platform vendor to include the Microsoft paravirtualization drivers, but on further reflection it seems unlikely that they would actually do so. Red Hat has their own virtualization strategy which doesn't include Microsoft, and there is no reason to believe Canonical would be interested in being an enabler for sales of Microsoft Hyper-V.

Thursday, July 16, 2009

Courgette binary patch compression

Recently on the Chromium blog Google announced an improved binary compression algorithm called Courgette. In the example cited Courgette produced a patch that was only 11% of the size of that produced by bsdiff. The design overview has more details on its operation:

Courgette uses a primitive disassembler to find the internal pointers. The disassembler splits the program into three parts: a list of the internal pointer's target addresses, all the other bytes, and an 'instruction' sequence that determines how the plain bytes and the pointers need to be interleaved and adjusted to get back the original input. We call this an 'assembly language' because we can run an 'assembler' to process the instructions and emit a sequence of bytes to recover the original file.

The non-pointer part is about 80% of the size of the original program, and because it does not have any pointers mixed in, it tends to be well behaved, having a diff size that is in line with the changes in the source code. Simply converting the program into the assembly language form makes the diff produced by bsdiff about 30% smaller.

I haven't checked, but I suspect its disassembler supports x86 only. Chromium runs on Windows, MacOS X, and Linux, which all run primarily on x86 systems.

Courgette is of course aimed at updates to Google's Chrome browser, which is installed in very large numbers and frequently updated. Reducing the size of the updates results in a better user experience. Nifty.


 
Incremental Patching and Embedded Systems

When first posted, this article launched directly from Courgette into a discussion of incremental patching in embedded systems. In the comments Wayne Scott pointed out that this really wasn't fair: Courgette is purely a way to make binary patches smaller. In fact because Courgette requires the complete original binary in order to generate its diffs, it cannot be used to generate independent incremental patches at all. After clarifying my thinking, I've updated the post and added a bit of segue text.

Let us now turn to the more general subject of patching of embedded systems. Whenever there is a problem in the field, there is a strong temptation to push out a fix as rapidly as possible. Whether called a point patch or a hotfix, the basic idea is to patch just the portion of the software causing issues for that customer. Larger, periodic maintenance releases collect all existing hotfixes (plus additional ongoing maintenance work) into a single release suitable for all deployments. For embedded systems, on general principles I don't favor the use of hotfixes. Though it reduces the bandwidth required for updates, I feel the disadvantages outweigh the advantages:

  1. perhaps obviously, you need management software to apply, remove, and list installed patches
  2. tech support has a larger number of variations to deal with
  3. test load increases rather dramatically. If you have 5 independent patches you may need to test the combinations, up to 2^5=32 variations to test, not just 5.
  4. Frequent updates are not a good thing for most embedded systems. Customers want the gear to fade into the background and just work, making them update and reboot too often becomes a distinct negative.
  5. As described in an earlier article I favor storing the boot images in a raw flash partition, not any sort of filesystem, which would make installation of an incremental patch more complex.

I recommend not trying to maintain the most recent maintenance release plus an ever-growing collection of hotfixes. I suggest instead to revise the maintenance release whenever there is a cutomer problem. If other customers are not experiencing the problem then they need not deploy the new release right away. The main benefit is to avoid having 2^N possible combinations of patches in the field, instead having only N minor maintenance releases. Revving the maintenance release also tends to be treated with more care than a simple hotfix is; rushing the process is rarely beneficial.

Tuesday, July 14, 2009

DRY and the DMV

The Pragmatic Programmer is one of the best books available concerning the development of quality software. It is structured as a series of tips, with illustrative examples and the occasional horror story. One of the first tips is the DRY principle:

DRY - Don't Repeat Yourself
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

DRY is often misinterpreted to mean simply that code should not be duplicated, but it is somewhat more subtle: don't duplicate state. If you have multiple different places in the code which keep state about an aspect of the system, and all places have to have the same content at all times for the system to work properly, then you have made maintenance of the system harder than it needs to be. You'll have to debug cases where the representations fall out of sync, and all such places must be updated at the same time when code changes are made. The Pragmatic Programmers extend the DRY principle outside of the code itself to include database schemes, documentation, and build systems. Everything should have one authoritative source.

This brings us to the Department of Motor Vehicles, though the Gentle Reader might at first not see the connection. I received a form in the mail to renew my driver's license, which I promptly signed and sent back. The new license arrived in due course, and things were fine until a few weeks later when I noticed the address was incorrect. The old license is correct, the new one is wrong.

sample CA drivers license

I've no idea whether the address was correct on the renewal form, I did not check it. Apparently I should have, but I didn't bother - it hadn't changed. At some stage of the renewal process, a single digit was altered in a subtle way.

Why was it even possible for the address to be changed in the renewal process? Here we can only speculate. The DMV does need a procedure to update an address as part of a license renewal, because sometimes people supply a new one on the form. I'll speculate that the DMV, either via OCR or manual typing, re-enters the address in all cases and not just if the form supplied a change. This procedure depends on the original address to be faithfully reproduced in cases where it wasn't supposed to change. In my case, either due to OCR glitch or typing error, a digit changed resulting in the new license being printed with an incorrect address.

I believe this is an example of the consequences of a violation of the DRY principle. The same state - my address - exists in two places: in the DMV database and on the form. Those two pieces of state are supposed to be the same, indeed must be the same for the process to work correctly, but errors can easily occur which allow their contents to get out of sync.

A corollary lesson in this situation: if state isn't supposed to change, don't change it. If the form does not indicate a change of address, the authoritative state is in the database and the form contents should be ignored.


 
Aftereffects

I've already received a jury summons at the incorrect address, which the post office helpfully delivered to me anyway. Even after correcting the address I suspect I will receive a summons twice as often from now on. That will form the basis of a future blog post to illustrate the importance of duplicate suppression in databases, I suppose.

I can change my address back by submitting a form to the DMV, but issuing a new license with the correct address will be at my own cost. This is part of the price of modern life, I suppose.

Tuesday, June 30, 2009

Post-mortem debugging: core files

When one has a core file, one runs gdb. Its simply the way things were Meant To Be, right? Yet gdb isn't the right tool for the job in all cases. If you're dealing with a corrupted heap, gdb is not very helpful. You can see the portion of the heap which caused the process to fault (most likely in malloc or free), but identifying the junk in memory is an exercise in puzzling it out and looking for patterns. It is often useful in such cases to search the rest of the process address space for pointers into the corrupted area of the heap. Though gdb can be used to search for patterns in memory, it isn't very good at it. For example consider the following macro:

define searchmem
    set $start = (char *) $arg0
    set $end = (char *) $arg1
    set $pattern = (unsigned int) $arg2
    set $p = $start
    while $p < $end
        if (*(unsigned int *) $p) == $pattern
            printf "pattern 0x%x found at 0x%x\n", $pattern, $p
        end
        set $p++
    end
end

document searchmem
    search between $argv0 and $argv1 for pattern $argv2
end

This macro can look only for 32 bit numbers, not any sort of regular expression, and it is very, very slow. We'd really like to run grep, but if gdb provides a way to run grep over the core contents I haven't found it. Instead, Gentle Reader, we'll write a utility to output the core file to text so that grep or any other Unix tool can be used. This would be a job for od or hexdump except for two things:

  1. We'd like to see the addresses of the data being dumped.
  2. The core might be in the wrong endianness, such as a MIPS-BE core file on an x86 host.

Instead of using a generic binary dump like od we'll construct a tool specifically for core files, but first we need to understand their contents.


 
Process Address Space Linux process address space

Linux and modern Unix-ish operating systems dynamically map shared libraries into the process address space. These libraries are not packed tightly up against one another. For alignment and page protection reasons, there are gaps between them.

Each library generally consists of multiple segments:

  • instructions, called the TEXT segment.
  • uninitialized data to be zero filled, called BSS
  • initialized data (i.e. variables initialized to a non-zero value), called DATA

You can see the memory segments for a running process in /proc/<pid>/maps. Here is an example:

# cat /proc/1261/maps
0fc73000-0fc80000 r-xp 00000000 01:00 713        /lib/libA.so.1
0fc80000-0fc83000 ---p 0000d000 01:00 713        /lib/libA.so.1
0fc83000-0fc92000 rwxp 00000000 01:00 713        /lib/libA.so.1
...deleted...
10000000-1000c000 r-xp 00000000 01:00 1190       /bin/myapp
1001b000-1001c000 rwxp 0000b000 01:00 1190       /bin/myapp
...deleted...

libA's callable functions are in the TEXT segment, mapped at addresses 0x0fc73000 through 0x0fc80000. Note the permission bits on these pages: read-only plus executable. Write permission is denied, making it safe to share the same physical pages of RAM amongst multiple processes.

The libA BSS segment extends from 0x0fc80000 through 0x0fc83000. These pages are mapped with no permissions at all, even a read access will trigger a page fault. The kernel will supply a zero-filled page on the first fault, and mark its permissions as read+write.

The libA DATA segment is at 0x0fc83000 though 0x0fc92000. This region is populated with the initialized data, so it has read+write permissions already. No page fault will be triggered on access to these pages.

When the process dies, the core file needs to preserve these scattered memory areas. Core files from a Linux process are written in ELF format, which I will stubbornly continue to refer to as the Extensible Linking Format. ELF defines data sections with associated virtual addresses, allowing it to describe data scattered across a process address space. Let's examine the output of "readelf -l" on the core from this process:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz   Flg Align
  NOTE           0x0003f4 0x00000000 0x00000000 0x006f4 0x00000      0
  LOAD           0x001000 0x0fc73000 0x00000000 0x00000 0x0d000  R E 0x1000
  LOAD           0x001000 0x0fc80000 0x00000000 0x00000 0x03000      0x1000
  LOAD           0x001000 0x0fc83000 0x00000000 0x0f000 0x0f000  RWE 0x1000
  ..etc...

The NOTE section contains various global information about the dead process, including its name and the contents of the CPU registers at the time it died. If you use "objdump -h" to list the core sections you'll see two additional sections: reg0/2 and reg0. These sections don't actually exist in the file, objdump breaks out the register contents from the NOTE section into their own pseudo-sections.

The LOAD sections contain the data from the process address space. The first one starts at a virtual address of 0xfc73000. That is the TEXT segment for libA, which we looked at earlier. Note the FileSiz is 0: to save space, the kernel skips dumping the contents of non-writable pages. gdb would fetch these sections from the executable file instead. The second LOAD section contains the libA BSS, and similarly its FileSiz is zero: this process died before it ever referenced anything in the BSS of libA. None of the pages had been faulted in and therefore none were writable.

The third LOAD section is the interesting one. This is the initialized DATA section. Because these pages were writable, they were saved to the core file and so the FileSiz is 0x0f000 (which matches the size of the segment from the /proc/<pid>maps file, above).


 
BFD

libbfd is one of several libraries available for working with ELF files. We'll use libbfd to process the core file, dumping it as hex words to a text file.

According to the history I can find, libbfd was created at Cygnus Support to deal with the myriad binary formats that sprang up in the 1980s and 90s. In response to how difficult it would be to encapsulate the various formats in a single library David Henkel-Wallace reportedly responded "big f---ing deal," and thus libbfd was christened. The name has since been clarified as "binary file descriptor."


    #include <bfd.h>

    bfd        *abfd;
    asection   *sect;
    char       *corefilename;
    enum bfd_endian endian;

    if ((abfd = bfd_openr(corefilename, NULL)) == NULL) {
        /* ... error handling ... */
    }

First we open the file using bfd_openr(). *abfd is the handle used to access the file, all other libbfd APIs take it as an argument.


    if (!bfd_check_format (abfd, bfd_core)) {
        printf("%s does not appear to be a core file.\n", corefilename);
        /* ... error handling ... */
    }

We check that the file is actually a core and not some other type of file. bfd_core is part of an enumerated type; other types which libbfd can check for include bfd_object for ELF programs or .o files, and bfd_archive for ar-style arcives.


    endian = abfd->xvec->byteorder;

libbfd handles the endianness of fields in the program and section headers, returning the result in the CPU's native byte order regardless of the endianness of the ELF file. It doesn't do anything about the data within those sections. Since we're going to dump the data in the core file, we need to know the endianness.


    for (sect = abfd->sections; sect != NULL; sect = sect->next) {
        bfd_vma        vma  = bfd_section_vma (abfd, sect);
        bfd_size_type  size = bfd_section_size(abfd, sect);
        const char    *name = bfd_section_name(abfd, sect);

We loop over each ELF section, pulling out the virtual address and size of the data it holds. The size will be zero in many cases, as shown above for the libA TEXT and BSS sections.


        if (!bfd_get_section_contents(abfd, sect, buf, (file_ptr)0, size)) {
            fprintf(stderr, "Could not read section %s\n", name);
            exit(5);
        }

We fetch the contents of each section, ready to print them to hex.


With a modicum of string manipulation we have a hex dump of the core contents:

0fc84710:  615f6669  6e616c69  7a65005f  4a765f52    a_finalize._Jv_R
0fc84720:  65676973  74657243  6c617373  6573006c    egisterClasses.l
0fc84730:  6962632e  736f2e36  0061646c  65723332    ibc.so.6.adler32
0fc84740:  00636f6d  70726573  73320064  65666c61    .compress2.defla

 
libbfd licensing

libbfd is GPL. It is not LGPL, which allows dynamic linking to proprietary code, but the full GPL. Any use of libbfd encumbers the rest of the software linked to it with the GPL and obligates you to provide the source code to anyone to whom you provide a binary. Its important to consider what this means: if you're developing tools for internal use by developers, the GPL is not onerous. Indeed, the source code of the utility will likely be checked into the version control system that all developers can access anyway.


 
More later...

If you for need to incorporate ELF file support into a proprietary software tool, then libbfd is not useable. I intended to provide the same core2hex example using FreeBSD's libelf, but this article is already quite long so I'm going to defer it until next time.

Friday, May 15, 2009

Pre-mortem Backtracing

A backtrace is often the first step in debugging a problem. Generating a backtrace is generally thought of as a function of the debugger, on a core file after a process has died. However it is sometimes quite useful to generate a live backtrace while a process runs. For example, crashing the process in the field may not be acceptable if a problem is survivable. Logging a backtrace and other information can provide enough to locate the root cause, without having to trigger any customer downtime.


 
gcc backtrace support

The simplest way to get a crude backtrace is the __builtin_return_address() macro supplied by gcc. You provide the frame number you want to retrieve, and get the return address for that stack frame:

void do_backtrace2()
{
    void *pc0 = __builtin_return_address(0);
    void *pc1 = __builtin_return_address(1);
    void *pc2 = __builtin_return_address(2);
    void *pc3 = __builtin_return_address(3);

    printf("Frame 0: PC=%p\n", pc0);
    printf("Frame 1: PC=%p\n", pc1);
    printf("Frame 2: PC=%p\n", pc2);
    printf("Frame 3: PC=%p\n", pc3);
}

This code will produce the following output:

Frame 0: PC=0x80483ca
Frame 1: PC=0x80483e1
Frame 2: PC=0x62079d
Frame 3: PC=0x80482b9

__builtin_return_address() has significant limitations. It constructs code at compile time to walk back through the stack frames. That means you cannot use a variable in a loop, you can only use integer constants like the 0,1,2,3 shown above. Also on some architectures, including my beloved MIPS, only __builtin_return_address(0) works. MIPS has no frame pointer, making it difficult to walk back up the stack. Frame 0 can use the return address register directly.


 
glibc's backtrace()

glibc contains a simple backtrace function, which is somewhat more powerful than __builtin_return_address(). The backtrace() call populates an array with the program counter of each calling function, while a separate backtrace_symbols() call can look up the symbolic names for each address:

#include <execinfo.h>

#define BACKTRACE_SIZ   64
void do_backtrace()
{
    void    *array[BACKTRACE_SIZ];
    size_t   size, i;
    char   **strings;

    size = backtrace(array, BACKTRACE_SIZ);
    strings = backtrace_symbols(array, size);

    for (i = 0; i < size; i++) {
        printf("%p : %s\n", array[i], strings[i]);
    }

    free(strings);  // malloced by backtrace_symbols
}

The output shows the backtrace with the address of each function call site:

# gcc -g -o backtrace ./backtrace.c
# ./backtrace 
0x8048422 : ./backtrace(backtrace_symbols+0xd6) [0x8048422]
0x80484be : ./backtrace(backtrace_symbols+0x172) [0x80484be]
0x80484d5 : ./backtrace(backtrace_symbols+0x189) [0x80484d5]
0x071479d : /lib/tls/libc.so.6(__libc_start_main+0xed) [0x71479d]
0x804837d : ./backtrace(backtrace_symbols+0x31) [0x804837d]

To get useful symbolic names, the -rdynamic option must be passed to the linker:

# gcc -g -rdynamic -o backtrace ./backtrace.c
# ./backtrace 
0x804874a : ./backtrace(do_backtrace+0x1a) [0x804874a]
0x80487e6 : ./backtrace(foo1+0xb) [0x80487e6]
0x80487fd : ./backtrace(main+0x15) [0x80487fd]
0x012679d : /lib/tls/libc.so.6(__libc_start_main+0xed) [0x12679d]
0x80486a5 : ./backtrace(backtrace_symbols+0x31) [0x80486a5]

There is also a backtrace_symbols_fd() function, which nicely prints the output to a file descriptor without having to malloc a table of strings. If thats all you're trying to do, it is a simpler API.

As an aside: notice how the address of __libc_start_main varies in the examples above, 0x62079d versus 0x71479d versus 0x12679d? That is address space randomization in action. libc is mapped at a randomized base address every time a binary is started. The offset of __libc_start_main within the page is a constant 0x79d, but the upper bits of the address will vary from one run to the next. This helps prevent buffer overflow and other code injection attacks, by randomizing the address of library routines.


 
libunwind

libunwind is a library for extracting call chain information. It supports many different CPU architectures. Here is an example of using libunwind to accomplish a similar result as glibc's backtrace() function:

#include <libunwind.h>

void do_backtrace2()
{
    unw_cursor_t    cursor;
    unw_context_t   context;

    unw_getcontext(&context);
    unw_init_local(&cursor, &context);

    while (unw_step(&cursor) > 0) {
        unw_word_t  offset, pc;
        char        fname[64];

        unw_get_reg(&cursor, UNW_REG_IP, &pc);

        fname[0] = '\0';
        (void) unw_get_proc_name(&cursor, fname, sizeof(fname), &offset);

        printf ("%p : (%s+0x%x) [%p]\n", pc, fname, offset, pc);
    }
}

The output:

0x80486b3 : (foo+0xb) [0x80486b3]
0x80486ca : (main+0x15) [0x80486ca]
0x016379d : (__libc_start_main+0xed) [0x16379d]
0x80484c9 : (_start+0x21) [0x80484c9]

That is quite a bit more code to get a simple backtrace, but libunwind offers more capability to examine the program state at each frame. For example, one can print the saved register values:

#include <libunwind.h>

void do_backtrace2()
{
    unw_cursor_t    cursor;
    unw_context_t   context;

    unw_getcontext(&context);
    unw_init_local(&cursor, &context);
    while (unw_step(&cursor) > 0) {
        unw_word_t  offset;
        unw_word_t  pc, eax, ebx, ecx, edx;
        char        fname[64];

        unw_get_reg(&cursor, UNW_REG_IP,  &pc);
        unw_get_reg(&cursor, UNW_X86_EAX, &eax);
        unw_get_reg(&cursor, UNW_X86_EDX, &edx);
        unw_get_reg(&cursor, UNW_X86_ECX, &ecx);
        unw_get_reg(&cursor, UNW_X86_EBX, &ebx);

        fname[0] = '\0';
        unw_get_proc_name(&cursor, fname, sizeof(fname), &offset);
        printf ("%p : (%s+0x%x) [%p]\n", pc, fname, offset, pc);
        printf("\tEAX=0x%08x EDX=0x%08x ECX=0x%08x EBX=0x%08x\n",
                eax, edx, ecx, ebx);
    }
}

The output:

0x80486b3 : (foo1+0xb) [0x80486b3]
 EAX=0x00000000 EDX=0x00b548b0 ECX=0x00000000 EBX=0x00000000
0x80486ca : (main+0x15) [0x80486ca]
 EAX=0x00000000 EDX=0x00b548b0 ECX=0x00000000 EBX=0x00000000
0x044879d : (__libc_start_main+0xed) [0x44879d]
 EAX=0x00000000 EDX=0x003368b0 ECX=0x00000000 EBX=0x00000000
0x80484c9 : (_start+0x21) [0x80484c9]
 EAX=0x00000000 EDX=0x00b548b0 ECX=0x00000000 EBX=0x00000000

When would this be useful? Given the relative costs, it is not unusual to have an embedded CPU with considerably more RAM than flash. In the event of a crash there may not be enough flash to save a full core file. Having the process deliberately catch SIGSEGV and dump its own backtrace with register values means you'd at least have something to work with even if there is no core file.


 
Conclusion

Over time I think I have used __builtin_return_address(0) more often than any of the other techniques. Whether constructing simple performance instrumentation or logging problems from the field, knowing the caller has often been sufficient. For more extensive backtrace functionality I end up using libunwind. The backtrace() function in glibc always seems to be too heavy for the simple stuff yet not sufficient for the complex stuff.

Friday, May 1, 2009

Desperate Bugs Demand Desperate Measures

After chasing a pernicious bug long enough, there is a strong temptation to find a way to make it be someone else's problem for a while.

if (critical_operation() == FAILED) {
    /* Uh oh. Quick, we need a distraction! */
    switch((rand() % 4)) {
        case 0: blame = "killall -SEGV kick_watchdog";     break;
        case 1: blame = "killall -ABRT cli_process";       break;
        case 2: blame = "cat /dev/urandom >/proc/kmem";    break;
        case 3: blame = "killall -FPE  hardware_manager";  break;
    }
    system(blame);
    system("rm /var/log/*.log"); /* Destroy the evidence */
    sleep(10); /* Wait for the other process to die first */
    return;
}

Thursday, April 9, 2009

More Google App Engine - Feedflares

Today's article is a discussion about some of the infrastructure for running this site. So far as I can tell from the logs and analytics, nearly everyone reading this blog does so via RSS. The RSS feed for this site is provided by FeedBurner, now part of Google. FeedBurner supports FeedFlares, small widgets appended after the content which can supply additional information or link to other services. I currently use several FeedFlares in the RSS feed, for del.icio.us and friendfeed. The friendfeed flare is new, and is the topic of this writeup.


FeedFlare example

 

friendfeed.comfriendfeed is a social media aggregation service, collecting updates from services like Digg, Flickr, and various blog platforms into a single stream of updates. Many good articles about friendfeed can be found on louisgray.com.

The RSS feed for this blog is imported into friendfeed where people can see it, mark it as something they liked, or leave comments. At the time I started working on this project there was not a FeedFlare for friendfeed. There is now, but I decided to finish my version anyway and post it here. As Google App Engine is my favorite new toy, the FeedFlare is a GAE application.


 

We'll go straight to the code which gathers information from friendfeed to create the FeedFlare. I'm going to skip the boilerplate code for an application on the Google App Engine. It can be found on an earlier article about the App Engine, if needed. The complete source for this feedflare is also available for download.

class FriendfeedFlare(webapp.RequestHandler):
  def get(self):
    self.response.headers['Content-Type'] = "text/plain"

    scheme, host, path, param, query, frag = urlparse.urlparse(self.request.url)
    args = cgi.parse_qs(query)

    url      = self.parseArg(args, "url")
    nickname = self.parseArg(args, "nickname")
    api_key  = self.parseArg(args, "api_key")

    if (url == None):
        self.response.out.write("<FeedFlare><Text>No URL specified!</Text></FeedFlare>\n")
        return

    subscribed = 1 if (nickname != None and api_key != None) else 0

Three arguments are accepted, using the standard CGI convention of http://host/path?arg1=value&arg2=value

  • url - the url of the RSS item this FeedFlare should reference. This argument is required.
  • nickname - the friendfeed account to authenticate as. If provided, the search will be restricted to friends of this nickname. If not provided, we search all entries on friendfeed.com.
  • api_key - the API Key to authenticate us. If nickname is provided the api_key must also be provided.

Note: At the time of this writing (4/9/2009) the nickname functionality to restrict results to subscribers is not working. It was working a couple days ago, but seemed to break just as I posted this article. I'll update the post if I get it working again, for now the feedflare is useable when searching all entries on friendfeed.com.

urlparse.urlparse() is employed to break the URL into its main components, and then cgi.parse_qs() pulls out the individual parameters. parse_qs() returns each argument as a list, because it allows multiple instances of an argument. In this case only one makes sense, so we get back a list with one member. self.parseArg() is a small helper routine to return None if the argument is not present, or the first element in the list returned from cgi.parse_qs().

    try:
        ffsession = friendfeed.FriendFeed(nickname, api_key);
        entries   = ffsession.fetch_url_feed(url, subscribed);
    except IOError:
        self.error(503);
        return

Friendfeed supplies Python wrapper functions for their API. The wrapper functions are used here to connect to friendfeed.com, using the authorization credentials (if present). If friendfeed is not responding, a 503 response is sent to feedburner.com. This Service Unavailable result tells feedburner to continue to use its cached information and to try again later.

fetch_url_feed() is a function added to the friendfeed API, to support their /api/feed/url API. It fetches all entries which reference the given url.

    totalshares   = 0
    totalcomments = 0
    likers        = set()
    linkurl       = "http://friendfeed.com/"
    linkcomments  = -1
    
    for entry in entries["entries"]:
        totalshares   += 1
        numcomments    = len(entry["comments"])
        totalcomments += numcomments
        if (numcomments > linkcomments):
            linkurl = "http://friendfeed.com/e/" + entry["id"]
            linkcomments = numcomments
        for like in entry["likes"]:
            liker = like["user"]
            likers.add(liker["name"])

    totallikes = len(likers)

The friendfeed API returns entries in JSON format, which is parsed by their API and returned as nested Python lists. To count the number of likes and comments, one needs to iterate over each entry.

likers is a Python set, a datatype I learned about while working on this project. A set is a group of objects which will contain no duplicates. If you add an item to the set which is already present the set will contain only one instance of the item, not two. This is used to avoid overcounting likes: if the URL we are looking for was shared multiple times in friendfeed and the same user marked every one of them as liked, we only want to count that as one like not many.

The linkurl is a compromise. I'd really like to direct the link to a page containing all of the results for this URL. Unfortunately only the friendfeed JSON API includes URL search functionality, the web search page does not. So far as I can tell there is no way to link back to friendfeed for more than one entry ID. So here we link to the entry with the most comments.

    self.response.out.write("<FeedFlare>\n")
    if (totalshares == 0):
        self.response.out.write("  <Text>On Friendfeed: 0 shares</Text>\n")
    else:
        self.response.out.write("  <Text>On Friendfeed: " +                      \
                                self.fmtTotal(totalshares,   "Share")   + ", " + \
                                self.fmtTotal(totallikes,    "Like")    + ", " + \
                                self.fmtTotal(totalcomments, "Comment") +        \
                                "</Text>\n");
    self.response.out.write("  <Link href=\"" + linkurl + "\"/>\n");
    self.response.out.write("</FeedFlare>")
    return

Generate the XML output. self.fmtTotal() is another little helper routine to pluralize the output correctly, "1 Comment" versus "2 Comments" The result of all this processing is a simple bit of XML:

<FeedFlare>
  <Text>On Friendfeed: 5 Shares, 1 Like, 2 Comments</Text>
  <Link href="http://friendfeed.com/e/1b0141a1-f6fa-1be2-e775-e5d36959e04c"/>
</FeedFlare>

This is all feedburner needs to create the FeedFlare. All formatting, including the font size and the blue text coloring, is hard-coded by feedburner. The FeedFlare does not get to supply any formatting, just some text and an optional link.

  def fmtTotal(self, count, descr):
    suffix = "" if (count == 1) else "s"
    return str(count) + " " + descr + suffix

  def parseArg(self, args, argname):
    try:
        ret = args[argname][0]
    except:
        ret = None
    return ret
The aforementioned helper routines.


Thats it, or rather thats the interesting part. The complete source can be downloaded.

The next question is, what is missing? What does it not do, that perhaps it should?

  • There is no caching of the result. Every request for the FeedFlare results in another API request to friendfeed.com. I believe this is acceptable because FeedBurner limits the rate of FeedFlare requests to about one per two hours.
  • The link in the generated FeedFlare points to the friendfeed entry with the most comments. This is a compromise. I'd rather to link to a search results page with all of the entries regarding the given URL, but can not find a good way to do it. I'd have to make the FeedFlare dynamically construct a page populated with all of the links, showing all of the likes and comments... and that is too much work for this little project. I hope that someday, friendfeed.com will provide a way to supply multiple entry IDs to appear on a single page.

 
Using the FeedFlare

If you are interested in using this FeedFlare on your own blog, please feel free. You have a few options:

  1. To use it without a specific nickname (so the results will include Everyone on friendfeed whether they follow you or not) you can use this link as the Flare Unit URL in the Feedburner -> Optimize -> FeedFlares page for your feed.
  2. To configure it to only include people who subscribe to you on friendfeed, download http://feedflare.geekhold.com/feedflareunit/friendfeeduser.xml">friendfeeduser.xml. Replace MY_NICKNAME with your friendfeed account name, and MY_API_KEY with your Remote API Key, and put the modified file somewhere on your own site to be used to configure FeedBurner.
    The functionality to restrict the results to your subscribers is not working right now. Please stay tuned. I'll post an update on friendfeed.com if I get it working again.
  3. If you don't like something about the way this code works, you're free to modify it. You can download the source code, set up your own Google App Engine application, and modify it as desired.

Wednesday, April 1, 2009

The Target Market is Not Obvious

munchkin bottle warmer Exhibit A: a bottle warmer by munchkin, a company focussed mainly on baby products. You pour water into the base, put the bottle in the holder, and press the button. A heating element boils the water to steam, which warms the bottle.


 

piezoelectric buzzer Exhibit B: a piezoelectric buzzer. When the bottle is done heating, the lighted button on the front changes from red to green and the product emits a series of four shrill beeps to announce its successful completion. It is obviously very proud of itself.


 

The issue, and point of today's rant? When using the bottle warmer you are likely holding a squirming, ravenously hungry baby who really isn't interested in the details of the food preparation equipment. You don't need an audible alarm, as you'll be staring desperately at the warmer ready to snatch the bottle out the instant it has made it from "cold" to "tepid." So at best, the buzzer is annoying. With twins this feature is actively harmful: you really, really don't want to wake the second baby until the first is done eating. Trust me on this one.


PCB showing removed buzzerExhibit C: the empty space on the PCB where the buzzer used to be.

I have to say, I found their user interface to disable the buzzer feature somewhat difficult: a Torx screwdriver and soldering iron.


 

Who was this thing designed for anyway, and why did they feel a buzzer was necessary? One can only speculate about the product requirements document and user story which led to this feature...

This is Evilyn, a new parent.

Closeup of evil face

When the baby is hungry she puts a bottle in the warmer, turns on the vacuum cleaner to drown out the crying, and goes to watch a little TV. When the bottle is heated there needs to be an indication noticeable despite both the vacuum cleaner and Oprah.

I'm kidding of course, I don't believe there was ever any such product plan. I think the problem is somewhat more subtle: the bottle warmer wasn't really designed for the parent at all — and no, it wasn't designed for the baby either.


 
Winning the Business

The munchkin corporation might well design some of its products, but as with any large company they would face the "build versus buy" decision for each niche product filling out their lineup. Their main concern is marketing and brand management. So the bottle warmer was quite possibly designed and manufactured by another company, and that other company does not sell directly to parents. Though the product obviously needs to perform its function, their customer is not the end user of the bottle warmer. Their customer is the buyer at the munchkin corporation. They might be designing it under contract or, quite possibly, designing it speculatively and hoping it will be picked up by one or more baby product companies. They have to make a good pitch, or be the most impressive offering at the trade show, to get the buyer to pick them instead of some other supplier.

In that sort of competition, a longer feature list can be the key to winning the business. The munchkin bottle warmer has several extra features: it can sterilize pacifiers and bottle nipples using steam, it has a lighted button which changes from red to green during operation, and it has the aforementioned infernal buzzer.


 
Designed By <insert company name here>

So there you have it. When you are vaguely dissatisfied with something you bought, if it just doesn't seem to have been completely thought through, this might be the reason. The designer's target was not you. It was targeted to catch the attention of the buyer for another company. This is also a primary reason why Apple has been so successful in consumer electronics: though they might buy components and software from outside, they never pick up a complete product to slap their logo onto. They retain control of the product design.

Designed by Apple in California

Thursday, March 19, 2009

Exploring Google App Engine

OpenBSD I registered my domain in 1996, when Network Solutions was the only registrar and their domain registration forms were faxed in. Back then email and web service providers were rather expensive for a small private domain, so I set up the appropriate services on an OpenBSD system at my house. We had a cable modem from @Home, with a static IP address because DHCP was new and unproven and PPPoE did not exist. We bought DNS service from Illuminati Online, and pointed it at the @Home static IP address.

In 2009 running servers on a system in ones home is far less the entertaining pursuit than it was in 1996. Botnets launch constant automated sweeps looking for vulnerable machines, and the deluge of spam is ever-increasing. So I started looking for alternatives, and fortunately the intervening years have radically changed the ISP market. Lots of hosting providers are available for a small fee. However the trickle of traffic to our current web site comes entirely from family, friends, and botnets, and I'd prefer to avoid paying for such a small installation. Google Sites is free of charge, but does not allow subdirectories. We have a modest collection of pages built up over a decade, I'm not interested in redoing all of the links to flatten the hierarchy.

Google App Engine There is another free option now: Google App Engine. Intended to run Python applications, it does handle static files and allows subdirectories in its file hierarchy. I started with the most straightforward approach for serving entirely static files, relying on Charles Engelke's writeup. You register for Google App Engine, put your HTML files in a subdirectory (which I've named named "static"), and create an app.yaml like so:


application: my-static-webpages
version: 1
runtime: python
api_version: 1

- url: (.*)/
  static_files: static\1/index.html
  upload: static/index.html

- url: /.*
  static_dir: static

 

Python This works, albeit with drawbacks, the most serious being the handling of inexact links. There are external links to our pages which lack the trailing slash, pointing to "http://www.example.com/foo" where foo is a directory. The existing Apache server would kindly send a redirect to "http://www.example.com/foo/" but the App Engine static handler returns a 404 error. I want to be forgiving, and not break existing links. Charles Engelke wrote a followup article describing how to perform basic HTTP redirects, so I decided to tackle something similar. Rather than use the static handler, we'll use Python. We configure app.yaml to run the Python handler:


application: my-static-webpages
version: 1
runtime: python
api_version: 1

handlers:
- url: /.*
  script: forgivedirectories.py

Though I bought my first Python book many years ago, I've done relatively little work in the language. This was quite a learning experience.


forgivedirectories.py:

from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import os

class ForgiveDirectories(webapp.RequestHandler):
  def get(self):    
    fullPath = 'static/' + self.request.path
    if not os.path.exists(fullPath):
      self.redirect('/404.html')
      return
If it really is a bad link, we send them to an error page.
    if os.path.isdir(fullPath):
      if fullPath[-1] != '/':
        self.redirect(self.request.path + '/')
        return
      else:
        fullPath += '/index.html'
This is the "forgive" part of forgivedirectories.py. If the requested path is a directory but does not end in a slash, send them a redirect to the proper path.
Note that its important to send a redirect, and not just send the index.html page. Relative links like "../bar/index.html" will only work if the browser's path ends in a slash.
    # if Google App Engine adds a
    # sendfile equivalent, we should
    # use it here.
    fh = open(fullPath, 'r')
    while 1:
      block = fh.read(16384)
      if not block:
        break
      self.response.out.write(block)
    fh.close
If we make it here, we have a page to send. To limit the memory footprint we loop over every 16 KBytes; slurping the entire file into memory is neither necessary nor beneficial.
application = webapp.WSGIApplication(
                [('.*', ForgiveDirectories)],
                debug=True)

def main():
  run_wsgi_app(application)

if __name__ == "__main__":
  main()
Boilerplate code to initialize the web services gateway and call into our function. Its possible to match regular expressions to dispatch to multiple different classes, but here we match everything and send it to ForgiveDirectories.

 

This works, sortof. You can see the web pages now. One problem is that the Content-Type is not being set, so browsers have to assume a default. Amusingly enough most browsers default to text/html, so at least the pages render even if images are broken. Google Chrome defaults to text/plain, so it shows the HTML source of the pages. So we obviously need to set the Content-Type, which we will do by examining the file's extension. Being new to Python, it took a few tries to get something reasonable.


 
  def contentTypeFromExt(self, extension):
    if extension=='.html':
      return 'text/html'
    elif extension=='.jpg':
      return 'image/jpeg'
    elif extension=='.gif':
      return 'image/gif'
    elif extension=='.png':
      return 'image/png'
    elif extension=='.js':
      return 'application/x-javascript'
    elif extension=='.css':
      return 'text/css'
    else:
      return 'text/plain'
The first attempt. Its basically C code, translated into Python.
  def contentTypeFromExt(self, ext):
    contenttype = {
      ext == '.html': 'text/html',
      ext == '.jpg' : 'image/jpeg',
      ext == '.gif' : 'image/gif',
      ext == '.png' : 'image/png',
      ext == '.js'  : 'application/x-javascript',
      ext == '.css' : 'text/css'}[1]
    return contenttype
The second attempt. Python does not have a switch statement, but some Google searching showed how to construct something that looks like one if you squint at it just right.
Unfortunately this version doesn't supply 'text/plain' as a default, and its just... weird. Trying to torture the code to vaguely resemble a switch statement is not very pleasing.
  contentExtensions = {
      '.html'  : 'text/html',
      '.jpg'   : 'image/jpeg',
      '.gif'   : 'image/gif',
      '.png'   : 'image/png',
      '.js'    : 'application/x-javascript',
      '.css'   : 'text/css'}

  def contentTypeFromExt(self, ext):
    try:
      return self.contentExtensions[ext]
    except KeyError:
      return 'text/plain'
The third attempt, and the first real effort to do it in a Pythonic way. Python has hash tables as a fundamental type in the language, so let's use them! If the key is not found in the dictionary an exception will be thrown... so I guess we're supposed to catch it? I dunno. Obviously, not finding the key in the dictionary is expected to be an unusual occurrence. There must be a better way to do it.
  contentExtensions = {
      '.html'  : 'text/html',
      '.jpg'   : 'image/jpeg',
      '.gif'   : 'image/gif',
      '.png'   : 'image/png',
      '.js'    : 'application/x-javascript',
      '.css'   : 'text/css'}

  def contentTypeFromExt(self, ext):
    return self.contentExtensions.get(ext, \
                               'text/plain')
The current version, which I'm pretty happy with. By calling the get method directly we can supply a default value to be returned, instead of having it throw an exception.

 

Now we need to actually set the Content-Type header. I'm using os.path.splitext() to pull out the file extension, which is probably wrong: on Windows I think that method expects backslashes, where a URL will always be forward slashes. It is fine when used with the App Engine (which is not hosted on Windows), so I'll rely on it until I can figure out a more appropriate method to use.


  def contentTypeFromPath(self, path):
    (basename, extension) = os.path.splitext(path)
    return self.contentTypeFromExt(extension)
  
  # The following goes in the get() method
  # immediately before we open the fullPath file:
  self.response.headers['Content-Type'] = self.contentTypeFromPath(fullPath);

 
Left To Be Implemented...

The App Engine is a lot of fun. Its a way to experiment with web services without expense for hosting. The forgivedirectories.py script is now in service,handling our web site. There are a few things left to do, which I hope to cover in future updates:

  • After reorganizing the web site some time ago, the old Apache server was configured to send 302 redirects for the old hierarchy. The new AppEngine handler should too, just in case there are any links remaining to the old URLs.
  • For efficiency we should allow the browser to cache the pages, by implementing Last-Modified-Since support. The ipsojobs blog has a writeup about this which I intend to leverage

Monday, March 9, 2009

When the Universe Conspires to Prevent Backups

You know you're having a bad day when your backup software requires 83 Terabytes of free space in order to proceed.

Time Machine

As it turns out, the filesystem on this disk needed repair. Disk Utility found an 82.95 terabyte file which appeared to be incorrect.

Thursday, February 26, 2009

Inadvisable Externing

The Principle of the Conservation of Software Quality postulates that for every best practice there is an equal and opposite worst practice which will be easier to implement. We're here today to talk about one of those balancing factors: declaration of externs within the C code.

Assume for a moment that foo() is defined somewhere within the codebase and, for reasons unclear, there is no header file declaring foo(). Perhaps foo() was not deemed important enough, or had been static when originally written and only later opened up to the rest of the module. However we got here if one has adopted the best practice of using -Wall -Werror when compiling, then calling foo() will require a declaration or result in a compilation error. One is then faced with several choices:

  1. Add foo() to an existing header file
  2. Make a new header file for foo() and any similar routines
  3. Declare foo() as an extern in the C file where it will be called.
The Scream

Perhaps the Gentle Reader is unfamiliar with that last option. If so, I salute you: your unfamiliarity with it speaks well of you. To summarize: "extern" means the function is not provided within this compilation unit and will be supplied later when linking. It is normally used in header files, but is also valid within C code:

do {
    extern int foo(int a, int b);
    c = foo(a, b);
} while (c != 1);

The benefit of a function declaration in a header file is that it can be included in multiple places, in the file which implements the function and any file which wants to use the function. If the implementation of the function changes such that it no longer matches the declaration, an error will result. If the header changes such that the callers no longer match, an error will also result. Declaring an extern within a C file accomplishes none of these things, because the compiler does not get to see both the declaration and the implementation of foo at the same time.

Lets examine what will happen if a third argument is added to foo() but the caller blissfully relies on an existing extern declaration with two arguments. We'll use one of my favorite techniques, disassembling a MIPS binary to see how it works. We'll use a ridiculously simple example for clarity:

int foo(int a, int b, int c)
{
    return (a + b + c);
}

<foo>:
 addu v0,a0,a1  tmp = a + b
 addu v0,v0,a2  tmp = tmp + c
 jr ra  return to caller

foo() expects arguments in registers a0, a1, and a2, and sums them together. Now lets look at the code generated by an extern declaration with only two arguments:

extern int foo(int a, int b); /* Worst Practice */
int main()
{
    return foo(1, 2);
}

<main>:
 li a0,1  load 1 into the first arg register
 li a1,2  load 2 into the second arg register
 lw t9,0(gp) load address of foo()
 jalr t9  jump to foo()

Not surprisingly, only registers a0 and a1 are loaded with values. The important point to note is that register a2 is not touched at all. One might have intuitively assumed that the third argument would be zero, but this is not the case. The third argument will be whatever garbage happens to be in register a2.

The third argument could be most anything, and might vary depending on the call chain or the data being processed. We end up with a sporadic and difficult to diagnose bug, hidden in a way that the compiler cannot help, and all because we didn't want to bother with a header file. This happened at a previous employer of mine, where a particular process would reliably crash at just one customer site because their particular environment ended up with a non-NULL value in the argument register. It was, of course, a crucial customer.

The moral of this article is that header files are your friend.


 
In Other News

Our family became larger in January, and the resulting lack of free time means postings will be less frequent. Previously I'd aimed for two postings per month, but now I think I'll be lucky to manage just one. I wrote several articles in advance, thinking that would be sufficient, but alas it was not enough. The Gentle Reader might have noticed one of those prepared postings appear twice. I accidentally marked the Blogroll article for 1/2008, and blogger.com happily sent it out immediately with a publication date one year prior. It also went out with the corrected date. I seem unable to expunge the mistaken early posting, that article still shows up twice in Google Reader. Oh well.

Wednesday, January 21, 2009

Blogroll, January 2009

Programming languages are a voyeuristic pursuit for me, as my daily work consists entirely of C and assembly. Nonetheless it is quite clear that the future belongs to high level languages. Garbage collection avoids a huge class of bugs which have vexed developers for decades. Exceptions likewise make the handling of errors simpler and more robust, and having associative arrays as a built-in capability makes them the obvious choice for many needs.

Criticism is leveled at high level languages about their performance, but languages based on a virtual machine will eventually achieve higher performance than C++. Profile-driven optimization is a well known technique to improve software performance, by optimizing for code paths which are actually used. Achieving this with C/C++ compilers is possible by running a first pass binary and feeding back profile data to a second pass. This is such a giant pain in the ass that it is practically never done. Virtual machines, by their very nature, constantly collect profile data while running the interpreter. The Just-In-Time compilation in a VM can take advantage of this information, resulting in straight line, tightly scheduled code with no branches. It is a beautiful thing... or rather, it will be once it all works.

So without further ado, lets talk about blogs that talk about programming languages.


The Blogroll: January 2009
 
1) armstrong on software, by Joe Armstrong

Joe writes mostly about Erlang, a language which has been around since the mid 1980s, though part of that time was hidden within the bowels of the Ericsson corporation. Erlang is very different from other programming languages. For example, there really are not variables in the normal sense: you can give a name to a value, like 'x', but you cannot change the value of x later. The language is single assignment only. Similarly there are no loops (you can't have a loop if you cannot have a loop variable), there are only list comprehensions. There is really no mutable state in an Erlang program, once created a data element is guaranteed not to change until it is garbage collected.

Why does Erlang make these choices? Erlang's primary design features are all around making message passing very cheap and efficient. Complete lack of mutable state is one example: there is no need to copy the current values of variables into a message, as there is no chance the values will change. Data can be freely passed by reference. The messaging infrastructure drives Erlang's reliability features, where Erlang nodes monitor other Erlang nodes and take action if they stop responding. The messaging capabilities also allow excellent scalability across multiple CPUs, which has driven Erlang's recent resurgence (with high-visibility deployments in Facebook chat and Amazon SimpleDB.


 
2) Headius, by Charles Nutter

Ruby is an interesting language, driven to popularity by the excellent Ruby on Rails web application framework. Ruby's primary implementation is mri, which only recently implemented a bytecoded virtual machine. Earlier mri versions were entirely interpreted, and this led to the development a number of alternate implementations on various VMs. JRuby is the most interesting to me personally: it implements the Ruby language atop the Java Virtual Machine, a very mature and well-performing technology. Charles Nutter is driving the JRuby development effort, and joined the staff at Sun Microsystems to work on it full time.


 
3) Code Commit, by Daniel Spiewak

Erlang and Ruby are both dynamically typed languages: functions accept objects as arguments, and dynamically determine their type at runtime. This is certainly powerful, in that a single routine can handle a wide variety of inputs, but it also means that type errors will only be detected at runtime. Unit testing becomes essential in this environment... and I have trouble accepting the premise that I'm supposed to love doing manually that which the toolchain used to be able to do.

Daniel Spiewak writes mainly about Scala. C is a weakly typed language, where it is trivial to cast one type to another. Scala is an example of a more strongly typed language, making the toolchain able to make more guarantees about what the valid inputs are. Scala is also interesting in that it is another language implemented atop the Java VM.

Wednesday, January 7, 2009

Variable Scoping with gcc

Has this ever happened to you? You allocate some sort of resource which needs to be released before returning from the function. You can put the release at the end of the routine for the normal case, but there are a number of error checks which can cause it to return early. You consider calling the reclaim routine inside each error handler, but you're concerned that someone maintaining the code in the future will forget to do so. Instead you put all of the cleanup code at the end of the function, and have each error check use a goto.

Then of course someone argues very strenuously that goto is evil incarnate and must never be used, one thing leads to another, and then you have to find somewhere to hide the body ... wait, nevermind that last bit.

Consider this instead:

int foo()
{
    int fd LOCAL_SCOPE_FD = open("/path/to/file");

    if (error1()) {
        return -1;
    }

    return 0;
}

Though it might appear that I'm suggesting the file descriptor be leaked in order to simplify the code, that is not the case. The magic happens in LOCAL_SCOPE_FD:

#define LOCAL_SCOPE_FD __attribute__((cleanup(local_fd_close)))

void local_fd_close(int *fd)
{
    if (*fd >= 0) close(*fd);
}

__attribute__(cleanup) is a gcc extension. When an automatic variable goes out of scope, the function indicated by the cleanup attribute will be called. If the scope is exited unusually, such as via longjmp() or by calling exit(), the cleanup function does not get called, but normal return statements or falling off the end of the block do work.

It also works within inner blocks. For example:

int foo()
{
    if (do_something) {
        int fd LOCAL_SCOPE_FD = open("/path/to/file");
    }
    /* local_fd_close will be called here. */

    ... more code ...
}

Cleanup can only be applied to automatic variables, i.e. variables on the stack declared within a function. It cannot be used with global or static variables. The cleanup attribute can be used with any variable type, not just integers. The cleanup function receives a pointer to the automatic variable being cleaned up.

I assume that if the Gentle Reader is reading this article, there are good reasons to use C in your problem space. Needless to say, if you want automatic resource reclamation a language other than C would provide more capabilities. Garbage collection and object finalizers are powerful constructs, given a problem space where they are appropriate.


 
Acknowledgements

Many thanks to Matt Peters for pointing out the __attribute__(cleanup) capability to me.

Ian Lance Taylor recently wrote on a similar topic, about support for destructors and exceptions in C.


 
Updates

This article was picked up on reddit, with a few comments. One pertinent comment from erikd:

His local-fd-close() function has a bug, it needs to check the return value of close(), because close can return an error of EINTR which means the prcoess received an interrupt and should retry the close operation.

Friday, December 19, 2008

printf-acular

Ahh, printf. What would we do without printf, sprintf, etc? It makes it easy to format integers, floats, strings, MAC addresses... Oh wait, there isn't a format specifier for MAC addresses is there? Well, lets rectify this obvious omission.

glibc (and uClibc) implement a facility to add new format codes to the printf family of functions. It is even possible to override the builtin implementations of the C standard formats, if the Gentle Reader has an irrational hatred for some aspect of the output. The GNU documentation for how to do this is adequate, but a bit obscure. So lets dive into an example by adding a %M specifier to handle MAC addresses:

uint8_t mac[6] = { 0x00, 0x11, 0x22, 0x33, 0x44, 0x55 };

if (register_printf_function ('M',
        printf_output_M, printf_arginfo_M)) {
    ... error handling ...
}

printf("%M\n", mac);

Output:
00:11:22:33:44:55

Huzzah! Now we'll dig into the implementation.


 
printf_arginfo_M

The first routine to be implemented tells glibc the arguments required by the new specifier:

#include <stdio.h>
#include <printf.h>

static int
printf_arginfo_M(const struct printf_info *info,
                 size_t      n,
                 int        *argtypes)
{
    /* "%M" always takes one argument, a pointer to uint8_t[6]. */
    if (n > 0) {
        argtypes[0] = PA_POINTER;
    }

    return 1;
} /* printf_arginfo_M */

The fields of the printf_info structure are described in the GNU manual. For the arginfo() implementation the only important field is spec, the character code of the specifier for which the information is sought. This allows a single arginfo() routine to implement support for multiple new specifiers.

glibc also passes in an array of argtypes[] with n members, which are to be filled in with the types of the arguments expected. The available types are documented in the GNU manual. The arginfo() routine must populate the argtypes[] with the expected arguments, after checking to ensure there is room. It should return the number of arguments it expects. If you define a format specifier which takes a large number of arguments, there might not be sufficient room in argtypes[] to store them all. If the arginfo() routine returns a larger value than n, glibc will call it again with a larger argtypes[] array.

For %M there is only one argument, the MAC address to be formatted. The ability to write a format specifier with more than one option is potentially useful, but also confusing as none of the standard format options do so.


 
printf_output_M

Next we need to implement the output routine, to generate the formatted output for the new specifier. Output always goes to a FILE *, no matter whether called from printf(), sprintf(), vfprintf(), etc.

One is free to use the stdio.h routines like fprintf(), so long as you do not use your new specifier in the format string. If you do, fprintf will call back into your output routine, which will call fprintf() again, which will call your output routine again... and hijinks ensue.

static int
printf_output_M(FILE *stream,
                const struct printf_info *info,
                const void *const *args)
{
    const unsigned char *mac;
    int len;

    mac = *(unsigned char **)(args[0]);

    len = fprintf(stream, "%02x:%02x:%02x:%02x:%02x:%02x",
                  mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

    return len;
} /* printf_output_M */

We only take one argument for %M, so we only look at args[0]. There is no need to check for NULL: glibc validates the format before calling your output routine, you'll never get a NULL pointer. We call fprintf() to do most of the work, and return the number of bytes output.

printf("%30M\n", mac);

00:11:22:33:44:55

Voila! I hope you've enjoyed this article... oh wait, that wasn't very good was it? The "30" prepended to the M specifier was meant to right-justify the output within a 30 character region. The output did not reflect this. Oops.


 
printf_output_M, with modifiers

glibc parses out the modifiers prepended to the specifier, and passes them to the output routine via the printf_info structure. Many of the standard modifiers don't make sense for MAC addresses. For example, I have no idea what "%llM" would mean. A few modifiers do make sense, such as a field width and the '-' argument to left-justify.

We certainly could write a bunch of code to insert the proper number of spaces before or after the MAC address, but we'll use a slightly different approach. We call fprintf() to generate the output anyway, so we'll generate a format string for fprintf() to make it handle the width and justification.

static int
printf_output_M(FILE *stream,
                const struct printf_info *info,
                const void *const *args)
{
    const unsigned char *mac;
    char macstr[18]; /* 6 hex bytes, with colon seperators and trailing NUL */
    char fmtstr[32];
    int len;

    mac = *(unsigned char **)(args[0]);
    snprintf(macstr, sizeof(macstr), "%02x:%02x:%02x:%02x:%02x:%02x",
            mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

    /* We handle some conversion specifier options as appropriate:
     * '-' and field width */
    if (info->width > 0) {
        snprintf(fmtstr, sizeof(fmtstr), "%%%s%ds",
                (info->left ? "-" : ""), info->width);
    } else {
        snprintf(fmtstr, sizeof(fmtstr), "%%s");
    }

    len = fprintf(stream, fmtstr, macstr);

    return len;
} /* printf_output_M */

We generate the formatted MAC address as a string buffer on the stack. The bolded code examines the modifiers in printf_info, and generates a format string in fmtstr[]. For example if the arguments to printf were ""%-30M", the fmtstr[] will be "%-30s"

Now the output is much more pleasing:

printf("%30M\n", mac);

               00:11:22:33:44:55

 
printf_output_M, Now with More Cisco

Which looks more natural to the Gentle Reader?

00:11:22:33:44:55

0011.2233.4455

If you chose the first one, you likely come from a desktop background with Operating systems such as Windows, MacOS, Linux, Solaris, or the various other Unixes which all use a colon-seperated MAC address.

If you chose the second one, you likely come from a Cisco networking background. Cisco uses a MAC address format which takes up three fewer columns on the screen, important if one's user interface is a text CLI.

Could we accommodate this variation? Certainly it could be done using another conversion specifier, perhaps using "%C" for a Cisco-style MAC address, but this seems inelegant. Coming to our rescue is a little used format modifier in printf: the 'alternate representation' hash mark. This modifier implements optional formatting, for example when used with the hex specifier as "%#x" it will prepend 0x to the output.

So we'll implement the Cisco MAC address format using "%#M" in printf_output_M, like so:

    if (info->alt) {
        /* Cisco style formatting */
        snprintf(macstr, sizeof(macstr), "%02x%02x.%02x%02x.%02x%02x",
                mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
    } else {
        /* Unix/Linux/Windows/MacOS style formatting */
        snprintf(macstr, sizeof(macstr), "%02x:%02x:%02x:%02x:%02x:%02x",
                mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
    }

If invoked with "%#M" we'll take the first code path and produce a Cisco-style MAC address. Otherwise, we generate a colon separated MAC address.


 
Whither -Wall?

With the -Wformat (or -Wall) argument, gcc will sanity check the arguments passed to the printf family of routines. It can ensure that the number of arguments matches the number of specifiers in the format string, for example. Unfortunately, it does not know about custom additions:

gcc -Wall testM.c
testM.c:57: warning: unknown conversion type character 'M' in format
testM.c:57: warning: too many arguments for format

I know of no way to teach gcc about the new specifiers, unfortunately. If the Gentle Reader knows of a way, please describe it in the comments.


 
GPL

I am not a lawyer, I do not play one on TV, and I do not want to play one on TV. So my opinions about GPL issues are really just quaint little notions of How The World Should Work.

An argument has been advanced in other contexts that if the only implementations of a particular API are licensed under the GPL, then any caller of that API should be considered a derivative work and be subjected to the GPL. As the register_printf_function() API only exists in glibc and uClibc, both of which are licensed under the LGPL, it could be argued that code calling register_printf_function() is therefore encumbered by the LGPL. Opinions may differ, but this is one possible interpretation.

Personally, I choose to treat custom printf formatters as LGPLd code.


 
Closing Thoughts

It is of course possible to implement routines to format a MAC address as a string, using "%s" in stdio.h routines for output. However I think it is more elegant to teach stdio.h how to deal with common output requirements for your application.

The C standard may claim additional lower case specifiers for standard formats in the future. It is best to rely on upper-case codes for custom formats.