Coding Relic

Friday, August 21, 2009

Smartbooks and Handheld PCs

Intel markets the Atom processor for netbooks. It trades lower processing power for very low power consumption, and is quite inexpensive in large quantities. These are product features where ARM/PowerPC/MIPS have long focussed, though they have aimed at non-PC form factor devices. Now, a new round of small laptops is hitting the market using non-x86 processors with either Windows CE or a Linux software stack like Google's Android or, eventually, Chrome OS. Most notably, Dell appears to be on the verge of introducing such a device - the true measure of whether a product category has entered the mainstream. These devices are generally called Smartbooks, owing to Qualcomm's extensive marketing push for its ARM Snapdragon chips in such a role.

Vadem Clio CL-1050 What strikes me about these devices is that we've been down a similar road before. In the late 1990s there was a flurry of activity around HandHeld PCs, which were relatively small and inexpensive compared to the laptops of the day. They typically used MIPS processors, ran Windows CE, and supplied basic word processing and communications software. Handheld PCs didn't last long on the market, and Microsoft ceased work on the software after just two years. The devices simply were not useful enough when compared to a full laptop.

This is another example of how infrastructure and market can make more of a contribution to the success of a new product as its design. If a product is too early, it won't get enough traction to drag the rest of the environment along. In 2009 these devices can depend on a fast wireless infrastructure and plentiful cloud-hosted applications, which did not exist in 1999. The world has changed.

Tuesday, August 18, 2009

Virtual Machines And Manual Transmissions

I've chosen a manual transmission for every vehicle I've purchased. It is a personal preference, I like the feeling of control over the engine and the ability to trade power for torque. Driving a stick shift was also an advantage in school: practically nobody knew how to drive it, so nobody could borrow my car.

For software development I code mostly in C, which is a rather thin layer on top of the machine. Even C++, while still considered a low-level language, nonetheless implements significantly more abstraction. Consider the following simple example of incrementing a variable in C and in C++:

int val = 0;

int incr() {
    val++;
}

C++:

class exi {
    int val;

    public:
        exi() { val = 0; };

        int incr() {
            val++;
        }
};

Note that in neither case is incr() declared to take an argument. We'll use one of my favorite techniques, disassembling the binary to see how it works. This time we're looking at PowerPC opcodes.

_main:
bl _incr	branch to incr()

_incr:
mfspr r9,lr	address of val
lwz r2,0xbc(r9)	fetch val
addi r2,r2,0x1	increment
stw r2,0xbc(r9)	store new val
blr	return

_main:
... much C++ object init removed...
addi r3,r1,0x38	object addr in arg0
bl __ZN6exi4incrEv	branch to exi::incr()

__ZN6exi4incrEv:
lwz r2,0x0(r3)	fetch val from arg0*
addi r2,r2,0x1	increment
stw r2,0x0(r3)	store new val
blr	return

Though the C++ source code does not show an argument to exi::incr(), at the machine level there nonetheless is one. The object address is passed as the first argument. Passing the object is necessary for C++ to handle "this" object - it has to have a pointer to operate on.

In low level languages you can generally see the relationship from source to the resulting machine code, even when significant compiler optimization is done. As we move to higher level languages, the abstractions between the source and machine code grow ever larger. C++ is somewhat higher level than C, and at the machine level the mapping from instructions back to source code is less clear. More abstract languages like Java, Python, and C# compile to a virtual machine running on top of the real system. If one gathered an instruction trace of CPU execution, one would be hard-pressed to correlate these instructions back to the source code they implement.

Foreshadowing

One can see the day coming when manual transmissions will be unavailable in most car models. Continuously variable transmissions were an early indication of this trend, with an essentially infinite number of gearing ratios that could only be effectively controlled by an engine computer. Hybrid vehicles have a complex transmission which meshes the output of two motors, and again can only be controlled by computer. Future vehicles will likely have an entirely electric drivetrain, with no need for a conventional transmission at all. The simple fact is that the engine computer can do a far better job of optimizing the behavior of the drivetrain than I can.

I'm currently digging in to the low level aspects of virtual machines. Running compilation just-in-time as part of a virtual machine has several notable advantages over static compilation with gcc:

gcc's optimization improves if you compile with profiling, run the program, and then compile again. This is so annoying that it is hardly ever done. A virtual machine always has profiling data available, as it interprets the bytecodes for a while before running the JIT.
gcc's profile-guided optimization is done in advance, on a representative corpus of input data which the programmer supplies. If the program operates on inputs which differ substantially from this, its performance will not be optimal. The JIT optimization is always done with the real data as profiling input.
gcc can optimize for a specific CPU pipeline, such as Core2 vs NetBurst vs i486. One is trading off performance improvement on the favored CPU versus degradation on other CPUs. The JIT can know the specific type of CPU being used and can optimize accordingly.
gcc can do static constant propagation across subroutines. That is, if a constant NULL is passed to a function gcc can create a version of that function which will eliminate any unreachable code. The JIT can create optimal versions of functions tuned for specific arguments dynamically, whether they are constant or variable. It just has to validate that the arguments still match the expected, and it is free to jump back to the interpreted bytecode on a mismatch.

This should be fun. Some initial thoughts:

modern CPUs have extensive branch prediction and speculative execution features, to keep it from spending all its time stalled for the outcome of a branch decision. What happens when we have a lot more big loops with straight line code, where the JIT has optimized all the conditionals up to sanity checks at the entry to the function?
Does widespread use of JIT mean that VLIW architectures become more viable? VLIW is particularly dependent on the compiler to match the code to available hardware resources, which a JIT is better positioned to tackle.

Tuesday, August 11, 2009

blog.8.11.2009 < /dev/random

A while ago I wrote a FeedFlare for FriendFeed, my second Google App Engine project. Monday morning, friendfeed.com announced that it has been acquired by Facebook. Anybody want to buy a slightly used FeedFlare?

A year ago I wrote about the importance of publishing the GPLd source used in commercial products, to avoid the public relations nightmare that comes of being accused of a GPL violation. One of the points in that article was of the SFLC suing Extreme Networks. I'm happy to report that the suit was settled in October 2008. I'm happy to report this because I imported busybox into the source tree at Extreme, and I really didn't want to get dragged in for depositions. Extreme's engineering management was convinced that waiting until somebody asked for the source code would be fine.

Thursday, August 6, 2009

Toward a More Robust Twitter Infrastructure

This morning's twitter outage due to DDoS attack reminds me: I wrote a guest post for the most excellent Cranky Product Manager, concerning how to make twitter more robust. It ran on April 1, 2009.

The Cranky Engineer Responds to a PRD

Wednesday, July 29, 2009

Embedded Linux Market Share

Last week Wind River, now a subsidiary of Intel, announced that it had taken the lead in the share of embedded Linux revenue.

ALAMEDA, CA - July 22, 2009 - Wind River, a wholly owned subsidiary of Intel Corporation, today announced it has been named the embedded Linux market leader by VDC Research Group. Released today in VDC's 2009 Linux in the Embedded Systems Market report, Wind River achieved the market share lead in 2008 with greater than 30 percent of total market revenue, more than seven percentage points over the next closest competitor. Wind River entered the Linux business in 2004 to complement its market-leading, proprietary operating system, VxWorks.

The unnamed "next closest competitor" is MontaVista Software, but the real competition is not between the different embedded software vendors. The real competition is between any commercial Linux vendor versus rolling your own distribution from source. The various kernel distributions for PowerPC and MIPS are easy to download and cross-compile. Assembling a filesystem is not difficult, as discussed in an earlier article on this site. Add busybox and either glibc or uClibc, and you are most of the way to a bootable system.

There are a few areas where embedded Linux vendors provide value, and why I generally advocate purchasing support from them for a Linux development project:

The compiler toolchain: Maintaining a cross-compiling toolchain is quite a bit of work. Most importantly, one has to stay on top of CPU bugs. All CPUs ship with bugs, even x86, but a dirty little secret of the RISC SoC business is that they can go to market with more significant problems than Intel or AMD could get away with. So long as the issue can be worked around in the compiler or assembler - by not emitting the problematic sequence of instructions - the chip will ship anyway and rectify problems in later spins. The bugs will be documented in the Errata, but the descriptions are made to sound quite innocuous. CPU developers make sure that the major commercial embedded Linux vendors have the needed workarounds in their toolchain.
The kernel development tax: if you look at the version control logs for Linux/MIPS or Linux/PowerPC, the people doing the heavy lifting are often employed at one of the Linux vendors. Those companies have an economic reason to pay for that development. Unfortunately this leads to a variation of the Prisoner's Dilemma: somebody has to fund it. One can either pay for a support contract in order to fund Linux development, or not pay but hope that enough other people do.
Proprietary tools: The package management tools and customized Eclipse IDE supplied by these vendors are generally not useful to me, but some of their supplementary tools for profiling or shared library size reduction are quite interesting.

It is sometimes galling to be paying for support for embedded Linux, particularly because the technical support for specific problems has never actually resolved anything for me. Nonetheless I do advocate having at least a minimal contract, using the supplied compiler toolchain and investigating the other tools they provide. It is worth spending some resources for.

(Original press release via Linux For Devices)

Monday, July 20, 2009

Microsoft Releases Linux Paravirtualization Driver Source

From the Microsoft press release:

REDMOND, Wash., July 20, 2009 - Today, in a break from the ordinary, Microsoft released 20,000 lines of device driver code to the Linux community. The code, which includes three Linux device drivers, has been submitted to the Linux kernel community for inclusion in the Linux tree. The drivers will be available to the Linux community and customers alike, and will enhance the performance of the Linux operating system when virtualized on Windows Server 2008 Hyper-V or Windows Server 2008 R2 Hyper-V.

Microsoft wants Hyper-V to compete with VMWare in all markets, and to do this it needs to have good support for virtualizing Linux. Microsoft very pragmatically decided that closed source paravirtualization drivers for Linux had no chance of success. ~~They'd get a press release out of such a move, but no significant adoption without Red Hat/Canonical/etc pulling the drivers in. Opening the source is in their best interests.~~

The last two posts have been an experiment of sorts. Prior posts had been written entirely from scratch on a technical topic. They take a long, long time to write. I wanted to try posting more frequently by adding a few thoughts to a relevant news item, but thus far I haven't been happy with the results. In this article I said opening the source would allow a Linux platform vendor to include the Microsoft paravirtualization drivers, but on further reflection it seems unlikely that they would actually do so. Red Hat has their own virtualization strategy which doesn't include Microsoft, and there is no reason to believe Canonical would be interested in being an enabler for sales of Microsoft Hyper-V.

Thursday, July 16, 2009

Courgette binary patch compression

Recently on the Chromium blog Google announced an improved binary compression algorithm called Courgette. In the example cited Courgette produced a patch that was only 11% of the size of that produced by bsdiff. The design overview has more details on its operation:

Courgette uses a primitive disassembler to find the internal pointers. The disassembler splits the program into three parts: a list of the internal pointer's target addresses, all the other bytes, and an 'instruction' sequence that determines how the plain bytes and the pointers need to be interleaved and adjusted to get back the original input. We call this an 'assembly language' because we can run an 'assembler' to process the instructions and emit a sequence of bytes to recover the original file.

The non-pointer part is about 80% of the size of the original program, and because it does not have any pointers mixed in, it tends to be well behaved, having a diff size that is in line with the changes in the source code. Simply converting the program into the assembly language form makes the diff produced by bsdiff about 30% smaller.

I haven't checked, but I suspect its disassembler supports x86 only. Chromium runs on Windows, MacOS X, and Linux, which all run primarily on x86 systems.

Courgette is of course aimed at updates to Google's Chrome browser, which is installed in very large numbers and frequently updated. Reducing the size of the updates results in a better user experience. Nifty.

Incremental Patching and Embedded Systems

When first posted, this article launched directly from Courgette into a discussion of incremental patching in embedded systems. In the comments Wayne Scott pointed out that this really wasn't fair: Courgette is purely a way to make binary patches smaller. In fact because Courgette requires the complete original binary in order to generate its diffs, it cannot be used to generate independent incremental patches at all. After clarifying my thinking, I've updated the post and added a bit of segue text.

Let us now turn to the more general subject of patching of embedded systems. Whenever there is a problem in the field, there is a strong temptation to push out a fix as rapidly as possible. Whether called a point patch or a hotfix, the basic idea is to patch just the portion of the software causing issues for that customer. Larger, periodic maintenance releases collect all existing hotfixes (plus additional ongoing maintenance work) into a single release suitable for all deployments. For embedded systems, on general principles I don't favor the use of hotfixes. Though it reduces the bandwidth required for updates, I feel the disadvantages outweigh the advantages:

perhaps obviously, you need management software to apply, remove, and list installed patches
tech support has a larger number of variations to deal with
test load increases rather dramatically. If you have 5 independent patches you may need to test the combinations, up to 2^5=32 variations to test, not just 5.
Frequent updates are not a good thing for most embedded systems. Customers want the gear to fade into the background and just work, making them update and reboot too often becomes a distinct negative.
As described in an earlier article I favor storing the boot images in a raw flash partition, not any sort of filesystem, which would make installation of an incremental patch more complex.

I recommend not trying to maintain the most recent maintenance release plus an ever-growing collection of hotfixes. I suggest instead to revise the maintenance release whenever there is a cutomer problem. If other customers are not experiencing the problem then they need not deploy the new release right away. The main benefit is to avoid having 2^N possible combinations of patches in the field, instead having only N minor maintenance releases. Revving the maintenance release also tends to be treated with more care than a simple hotfix is; rushing the process is rarely beneficial.

Tuesday, July 14, 2009

DRY and the DMV

The Pragmatic Programmer is one of the best books available concerning the development of quality software. It is structured as a series of tips, with illustrative examples and the occasional horror story. One of the first tips is the DRY principle:

DRY - Don't Repeat Yourself
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

DRY is often misinterpreted to mean simply that code should not be duplicated, but it is somewhat more subtle: don't duplicate state. If you have multiple different places in the code which keep state about an aspect of the system, and all places have to have the same content at all times for the system to work properly, then you have made maintenance of the system harder than it needs to be. You'll have to debug cases where the representations fall out of sync, and all such places must be updated at the same time when code changes are made. The Pragmatic Programmers extend the DRY principle outside of the code itself to include database schemes, documentation, and build systems. Everything should have one authoritative source.

This brings us to the Department of Motor Vehicles, though the Gentle Reader might at first not see the connection. I received a form in the mail to renew my driver's license, which I promptly signed and sent back. The new license arrived in due course, and things were fine until a few weeks later when I noticed the address was incorrect. The old license is correct, the new one is wrong.

I've no idea whether the address was correct on the renewal form, I did not check it. Apparently I should have, but I didn't bother - it hadn't changed. At some stage of the renewal process, a single digit was altered in a subtle way.

Why was it even possible for the address to be changed in the renewal process? Here we can only speculate. The DMV does need a procedure to update an address as part of a license renewal, because sometimes people supply a new one on the form. I'll speculate that the DMV, either via OCR or manual typing, re-enters the address in all cases and not just if the form supplied a change. This procedure depends on the original address to be faithfully reproduced in cases where it wasn't supposed to change. In my case, either due to OCR glitch or typing error, a digit changed resulting in the new license being printed with an incorrect address.

I believe this is an example of the consequences of a violation of the DRY principle. The same state - my address - exists in two places: in the DMV database and on the form. Those two pieces of state are supposed to be the same, indeed must be the same for the process to work correctly, but errors can easily occur which allow their contents to get out of sync.

A corollary lesson in this situation: if state isn't supposed to change, don't change it. If the form does not indicate a change of address, the authoritative state is in the database and the form contents should be ignored.

Aftereffects

I've already received a jury summons at the incorrect address, which the post office helpfully delivered to me anyway. Even after correcting the address I suspect I will receive a summons twice as often from now on. That will form the basis of a future blog post to illustrate the importance of duplicate suppression in databases, I suppose.

I can change my address back by submitting a form to the DMV, but issuing a new license with the correct address will be at my own cost. This is part of the price of modern life, I suppose.

Tuesday, June 30, 2009

Post-mortem debugging: core files

When one has a core file, one runs gdb. Its simply the way things were Meant To Be, right? Yet gdb isn't the right tool for the job in all cases. If you're dealing with a corrupted heap, gdb is not very helpful. You can see the portion of the heap which caused the process to fault (most likely in malloc or free), but identifying the junk in memory is an exercise in puzzling it out and looking for patterns. It is often useful in such cases to search the rest of the process address space for pointers into the corrupted area of the heap. Though gdb can be used to search for patterns in memory, it isn't very good at it. For example consider the following macro:

define searchmem
    set $start = (char *) $arg0
    set $end = (char *) $arg1
    set $pattern = (unsigned int) $arg2
    set $p = $start
    while $p < $end
        if (*(unsigned int *) $p) == $pattern
            printf "pattern 0x%x found at 0x%x\n", $pattern, $p
        end
        set $p++
    end
end

document searchmem
    search between $argv0 and $argv1 for pattern $argv2
end

This macro can look only for 32 bit numbers, not any sort of regular expression, and it is very, very slow. We'd really like to run grep, but if gdb provides a way to run grep over the core contents I haven't found it. Instead, Gentle Reader, we'll write a utility to output the core file to text so that grep or any other Unix tool can be used. This would be a job for od or hexdump except for two things:

We'd like to see the addresses of the data being dumped.
The core might be in the wrong endianness, such as a MIPS-BE core file on an x86 host.

Instead of using a generic binary dump like od we'll construct a tool specifically for core files, but first we need to understand their contents.

Process Address Space Linux process address space

Linux and modern Unix-ish operating systems dynamically map shared libraries into the process address space. These libraries are not packed tightly up against one another. For alignment and page protection reasons, there are gaps between them.

Each library generally consists of multiple segments:

instructions, called the TEXT segment.
uninitialized data to be zero filled, called BSS
initialized data (i.e. variables initialized to a non-zero value), called DATA

You can see the memory segments for a running process in /proc/<pid>/maps. Here is an example:

# cat /proc/1261/maps
0fc73000-0fc80000 r-xp 00000000 01:00 713        /lib/libA.so.1
0fc80000-0fc83000 ---p 0000d000 01:00 713        /lib/libA.so.1
0fc83000-0fc92000 rwxp 00000000 01:00 713        /lib/libA.so.1
...deleted...
10000000-1000c000 r-xp 00000000 01:00 1190       /bin/myapp
1001b000-1001c000 rwxp 0000b000 01:00 1190       /bin/myapp
...deleted...

libA's callable functions are in the TEXT segment, mapped at addresses 0x0fc73000 through 0x0fc80000. Note the permission bits on these pages: read-only plus executable. Write permission is denied, making it safe to share the same physical pages of RAM amongst multiple processes.

The libA BSS segment extends from 0x0fc80000 through 0x0fc83000. These pages are mapped with no permissions at all, even a read access will trigger a page fault. The kernel will supply a zero-filled page on the first fault, and mark its permissions as read+write.

The libA DATA segment is at 0x0fc83000 though 0x0fc92000. This region is populated with the initialized data, so it has read+write permissions already. No page fault will be triggered on access to these pages.

When the process dies, the core file needs to preserve these scattered memory areas. Core files from a Linux process are written in ELF format, which I will stubbornly continue to refer to as the Extensible Linking Format. ELF defines data sections with associated virtual addresses, allowing it to describe data scattered across a process address space. Let's examine the output of "readelf -l" on the core from this process:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz   Flg Align
  NOTE           0x0003f4 0x00000000 0x00000000 0x006f4 0x00000      0
  LOAD           0x001000 0x0fc73000 0x00000000 0x00000 0x0d000  R E 0x1000
  LOAD           0x001000 0x0fc80000 0x00000000 0x00000 0x03000      0x1000
  LOAD           0x001000 0x0fc83000 0x00000000 0x0f000 0x0f000  RWE 0x1000
  ..etc...

The NOTE section contains various global information about the dead process, including its name and the contents of the CPU registers at the time it died. If you use "objdump -h" to list the core sections you'll see two additional sections: reg0/2 and reg0. These sections don't actually exist in the file, objdump breaks out the register contents from the NOTE section into their own pseudo-sections.

The LOAD sections contain the data from the process address space. The first one starts at a virtual address of 0xfc73000. That is the TEXT segment for libA, which we looked at earlier. Note the FileSiz is 0: to save space, the kernel skips dumping the contents of non-writable pages. gdb would fetch these sections from the executable file instead. The second LOAD section contains the libA BSS, and similarly its FileSiz is zero: this process died before it ever referenced anything in the BSS of libA. None of the pages had been faulted in and therefore none were writable.

The third LOAD section is the interesting one. This is the initialized DATA section. Because these pages were writable, they were saved to the core file and so the FileSiz is 0x0f000 (which matches the size of the segment from the /proc/<pid>maps file, above).

BFD

libbfd is one of several libraries available for working with ELF files. We'll use libbfd to process the core file, dumping it as hex words to a text file.

According to the history I can find, libbfd was created at Cygnus Support to deal with the myriad binary formats that sprang up in the 1980s and 90s. In response to how difficult it would be to encapsulate the various formats in a single library David Henkel-Wallace reportedly responded "big f---ing deal," and thus libbfd was christened. The name has since been clarified as "binary file descriptor."

    #include <bfd.h>

    bfd        *abfd;
    asection   *sect;
    char       *corefilename;
    enum bfd_endian endian;

    if ((abfd = bfd_openr(corefilename, NULL)) == NULL) {
        /* ... error handling ... */
    }

First we open the file using bfd_openr(). *abfd is the handle used to access the file, all other libbfd APIs take it as an argument.

    if (!bfd_check_format (abfd, bfd_core)) {
        printf("%s does not appear to be a core file.\n", corefilename);
        /* ... error handling ... */
    }

We check that the file is actually a core and not some other type of file. bfd_core is part of an enumerated type; other types which libbfd can check for include bfd_object for ELF programs or .o files, and bfd_archive for ar-style arcives.

    endian = abfd->xvec->byteorder;

libbfd handles the endianness of fields in the program and section headers, returning the result in the CPU's native byte order regardless of the endianness of the ELF file. It doesn't do anything about the data within those sections. Since we're going to dump the data in the core file, we need to know the endianness.

    for (sect = abfd->sections; sect != NULL; sect = sect->next) {
        bfd_vma        vma  = bfd_section_vma (abfd, sect);
        bfd_size_type  size = bfd_section_size(abfd, sect);
        const char    *name = bfd_section_name(abfd, sect);

We loop over each ELF section, pulling out the virtual address and size of the data it holds. The size will be zero in many cases, as shown above for the libA TEXT and BSS sections.

        if (!bfd_get_section_contents(abfd, sect, buf, (file_ptr)0, size)) {
            fprintf(stderr, "Could not read section %s\n", name);
            exit(5);
        }

We fetch the contents of each section, ready to print them to hex.

With a modicum of string manipulation we have a hex dump of the core contents:

0fc84710:  615f6669  6e616c69  7a65005f  4a765f52    a_finalize._Jv_R
0fc84720:  65676973  74657243  6c617373  6573006c    egisterClasses.l
0fc84730:  6962632e  736f2e36  0061646c  65723332    ibc.so.6.adler32
0fc84740:  00636f6d  70726573  73320064  65666c61    .compress2.defla

libbfd licensing

libbfd is GPL. It is not LGPL, which allows dynamic linking to proprietary code, but the full GPL. Any use of libbfd encumbers the rest of the software linked to it with the GPL and obligates you to provide the source code to anyone to whom you provide a binary. Its important to consider what this means: if you're developing tools for internal use by developers, the GPL is not onerous. Indeed, the source code of the utility will likely be checked into the version control system that all developers can access anyway.

More later...

If you for need to incorporate ELF file support into a proprietary software tool, then libbfd is not useable. I intended to provide the same core2hex example using FreeBSD's libelf, but this article is already quite long so I'm going to defer it until next time.

Friday, May 15, 2009

Pre-mortem Backtracing

A backtrace is often the first step in debugging a problem. Generating a backtrace is generally thought of as a function of the debugger, on a core file after a process has died. However it is sometimes quite useful to generate a live backtrace while a process runs. For example, crashing the process in the field may not be acceptable if a problem is survivable. Logging a backtrace and other information can provide enough to locate the root cause, without having to trigger any customer downtime.

gcc backtrace support

Ooooh, illustrations!. The simplest way to get a crude backtrace is the __builtin_return_address() macro supplied by gcc. You provide the frame number you want to retrieve, and get the return address for that stack frame:

void do_backtrace2()
{
    void *pc0 = __builtin_return_address(0);
    void *pc1 = __builtin_return_address(1);
    void *pc2 = __builtin_return_address(2);
    void *pc3 = __builtin_return_address(3);

    printf("Frame 0: PC=%p\n", pc0);
    printf("Frame 1: PC=%p\n", pc1);
    printf("Frame 2: PC=%p\n", pc2);
    printf("Frame 3: PC=%p\n", pc3);
}

This code will produce the following output:

Frame 0: PC=0x80483ca
Frame 1: PC=0x80483e1
Frame 2: PC=0x62079d
Frame 3: PC=0x80482b9

__builtin_return_address() has significant limitations. It constructs code at compile time to walk back through the stack frames. That means you cannot use a variable in a loop, you can only use integer constants like the 0,1,2,3 shown above. Also on some architectures, including my beloved MIPS, only __builtin_return_address(0) works. MIPS has no frame pointer, making it difficult to walk back up the stack. Frame 0 can use the return address register directly.

glibc's backtrace()

glibc contains a simple backtrace function, which is somewhat more powerful than __builtin_return_address(). The backtrace() call populates an array with the program counter of each calling function, while a separate backtrace_symbols() call can look up the symbolic names for each address:

#include <execinfo.h>

#define BACKTRACE_SIZ   64
void do_backtrace()
{
    void    *array[BACKTRACE_SIZ];
    size_t   size, i;
    char   **strings;

    size = backtrace(array, BACKTRACE_SIZ);
    strings = backtrace_symbols(array, size);

    for (i = 0; i < size; i++) {
        printf("%p : %s\n", array[i], strings[i]);
    }

    free(strings);  // malloced by backtrace_symbols
}

The output shows the backtrace with the address of each function call site:

# gcc -g -o backtrace ./backtrace.c
# ./backtrace 
0x8048422 : ./backtrace(backtrace_symbols+0xd6) [0x8048422]
0x80484be : ./backtrace(backtrace_symbols+0x172) [0x80484be]
0x80484d5 : ./backtrace(backtrace_symbols+0x189) [0x80484d5]
0x071479d : /lib/tls/libc.so.6(__libc_start_main+0xed) [0x71479d]
0x804837d : ./backtrace(backtrace_symbols+0x31) [0x804837d]

To get useful symbolic names, the -rdynamic option must be passed to the linker:

# gcc -g -rdynamic -o backtrace ./backtrace.c
# ./backtrace 
0x804874a : ./backtrace(do_backtrace+0x1a) [0x804874a]
0x80487e6 : ./backtrace(foo1+0xb) [0x80487e6]
0x80487fd : ./backtrace(main+0x15) [0x80487fd]
0x012679d : /lib/tls/libc.so.6(__libc_start_main+0xed) [0x12679d]
0x80486a5 : ./backtrace(backtrace_symbols+0x31) [0x80486a5]

There is also a backtrace_symbols_fd() function, which nicely prints the output to a file descriptor without having to malloc a table of strings. If thats all you're trying to do, it is a simpler API.

As an aside: notice how the address of __libc_start_main varies in the examples above, 0x62079d versus 0x71479d versus 0x12679d? That is address space randomization in action. libc is mapped at a randomized base address every time a binary is started. The offset of __libc_start_main within the page is a constant 0x79d, but the upper bits of the address will vary from one run to the next. This helps prevent buffer overflow and other code injection attacks, by randomizing the address of library routines.

libunwind

libunwind is a library for extracting call chain information. It supports many different CPU architectures. Here is an example of using libunwind to accomplish a similar result as glibc's backtrace() function:

#include <libunwind.h>

void do_backtrace2()
{
    unw_cursor_t    cursor;
    unw_context_t   context;

    unw_getcontext(&context);
    unw_init_local(&cursor, &context);

    while (unw_step(&cursor) > 0) {
        unw_word_t  offset, pc;
        char        fname[64];

        unw_get_reg(&cursor, UNW_REG_IP, &pc);

        fname[0] = '\0';
        (void) unw_get_proc_name(&cursor, fname, sizeof(fname), &offset);

        printf ("%p : (%s+0x%x) [%p]\n", pc, fname, offset, pc);
    }
}

The output:

0x80486b3 : (foo+0xb) [0x80486b3]
0x80486ca : (main+0x15) [0x80486ca]
0x016379d : (__libc_start_main+0xed) [0x16379d]
0x80484c9 : (_start+0x21) [0x80484c9]

That is quite a bit more code to get a simple backtrace, but libunwind offers more capability to examine the program state at each frame. For example, one can print the saved register values:

#include <libunwind.h>

void do_backtrace2()
{
    unw_cursor_t    cursor;
    unw_context_t   context;

    unw_getcontext(&context);
    unw_init_local(&cursor, &context);
    while (unw_step(&cursor) > 0) {
        unw_word_t  offset;
        unw_word_t  pc, eax, ebx, ecx, edx;
        char        fname[64];

        unw_get_reg(&cursor, UNW_REG_IP,  &pc);
        unw_get_reg(&cursor, UNW_X86_EAX, &eax);
        unw_get_reg(&cursor, UNW_X86_EDX, &edx);
        unw_get_reg(&cursor, UNW_X86_ECX, &ecx);
        unw_get_reg(&cursor, UNW_X86_EBX, &ebx);

        fname[0] = '\0';
        unw_get_proc_name(&cursor, fname, sizeof(fname), &offset);
        printf ("%p : (%s+0x%x) [%p]\n", pc, fname, offset, pc);
        printf("\tEAX=0x%08x EDX=0x%08x ECX=0x%08x EBX=0x%08x\n",
                eax, edx, ecx, ebx);
    }
}

The output:

0x80486b3 : (foo1+0xb) [0x80486b3]
 EAX=0x00000000 EDX=0x00b548b0 ECX=0x00000000 EBX=0x00000000
0x80486ca : (main+0x15) [0x80486ca]
 EAX=0x00000000 EDX=0x00b548b0 ECX=0x00000000 EBX=0x00000000
0x044879d : (__libc_start_main+0xed) [0x44879d]
 EAX=0x00000000 EDX=0x003368b0 ECX=0x00000000 EBX=0x00000000
0x80484c9 : (_start+0x21) [0x80484c9]
 EAX=0x00000000 EDX=0x00b548b0 ECX=0x00000000 EBX=0x00000000

When would this be useful? Given the relative costs, it is not unusual to have an embedded CPU with considerably more RAM than flash. In the event of a crash there may not be enough flash to save a full core file. Having the process deliberately catch SIGSEGV and dump its own backtrace with register values means you'd at least have something to work with even if there is no core file.

Conclusion

Over time I think I have used __builtin_return_address(0) more often than any of the other techniques. Whether constructing simple performance instrumentation or logging problems from the field, knowing the caller has often been sufficient. For more extensive backtrace functionality I end up using libunwind. The backtrace() function in glibc always seems to be too heavy for the simple stuff yet not sufficient for the complex stuff.

Friday, May 1, 2009

Desperate Bugs Demand Desperate Measures

After chasing a pernicious bug long enough, there is a strong temptation to find a way to make it be someone else's problem for a while.

if (critical_operation() == FAILED) {
    /* Uh oh. Quick, we need a distraction! */
    switch((rand() % 4)) {
        case 0: blame = "killall -SEGV kick_watchdog";     break;
        case 1: blame = "killall -ABRT cli_process";       break;
        case 2: blame = "cat /dev/urandom >/proc/kmem";    break;
        case 3: blame = "killall -FPE  hardware_manager";  break;
    }
    system(blame);
    system("rm /var/log/*.log"); /* Destroy the evidence */
    sleep(10); /* Wait for the other process to die first */
    return;
}

Thursday, April 9, 2009

More Google App Engine - Feedflares

Today's article is a discussion about some of the infrastructure for running this site. So far as I can tell from the logs and analytics, nearly everyone reading this blog does so via RSS. The RSS feed for this site is provided by FeedBurner, now part of Google. FeedBurner supports FeedFlares, small widgets appended after the content which can supply additional information or link to other services. I currently use several FeedFlares in the RSS feed, for del.icio.us and friendfeed. The friendfeed flare is new, and is the topic of this writeup.

friendfeed is a social media aggregation service, collecting updates from services like Digg, Flickr, and various blog platforms into a single stream of updates. Many good articles about friendfeed can be found on louisgray.com.

The RSS feed for this blog is imported into friendfeed where people can see it, mark it as something they liked, or leave comments. At the time I started working on this project there was not a FeedFlare for friendfeed. There is now, but I decided to finish my version anyway and post it here. As Google App Engine is my favorite new toy, the FeedFlare is a GAE application.

We'll go straight to the code which gathers information from friendfeed to create the FeedFlare. I'm going to skip the boilerplate code for an application on the Google App Engine. It can be found on an earlier article about the App Engine, if needed. The complete source for this feedflare is also available for download.

class FriendfeedFlare(webapp.RequestHandler):
  def get(self):
    self.response.headers['Content-Type'] = "text/plain"

    scheme, host, path, param, query, frag = urlparse.urlparse(self.request.url)
    args = cgi.parse_qs(query)

    url      = self.parseArg(args, "url")
    nickname = self.parseArg(args, "nickname")
    api_key  = self.parseArg(args, "api_key")

    if (url == None):
        self.response.out.write("<FeedFlare><Text>No URL specified!</Text></FeedFlare>\n")
        return

    subscribed = 1 if (nickname != None and api_key != None) else 0

Three arguments are accepted, using the standard CGI convention of http://host/path?arg1=value&arg2=value

url - the url of the RSS item this FeedFlare should reference. This argument is required.
nickname - the friendfeed account to authenticate as. If provided, the search will be restricted to friends of this nickname. If not provided, we search all entries on friendfeed.com.
api_key - the API Key to authenticate us. If nickname is provided the api_key must also be provided.

Note: At the time of this writing (4/9/2009) the nickname functionality to restrict results to subscribers is not working. It was working a couple days ago, but seemed to break just as I posted this article. I'll update the post if I get it working again, for now the feedflare is useable when searching all entries on friendfeed.com.

urlparse.urlparse() is employed to break the URL into its main components, and then cgi.parse_qs() pulls out the individual parameters. parse_qs() returns each argument as a list, because it allows multiple instances of an argument. In this case only one makes sense, so we get back a list with one member. self.parseArg() is a small helper routine to return None if the argument is not present, or the first element in the list returned from cgi.parse_qs().

    try:
        ffsession = friendfeed.FriendFeed(nickname, api_key);
        entries   = ffsession.fetch_url_feed(url, subscribed);
    except IOError:
        self.error(503);
        return

Friendfeed supplies Python wrapper functions for their API. The wrapper functions are used here to connect to friendfeed.com, using the authorization credentials (if present). If friendfeed is not responding, a 503 response is sent to feedburner.com. This Service Unavailable result tells feedburner to continue to use its cached information and to try again later.

fetch_url_feed() is a function added to the friendfeed API, to support their /api/feed/url API. It fetches all entries which reference the given url.

    totalshares   = 0
    totalcomments = 0
    likers        = set()
    linkurl       = "http://friendfeed.com/"
    linkcomments  = -1
    
    for entry in entries["entries"]:
        totalshares   += 1
        numcomments    = len(entry["comments"])
        totalcomments += numcomments
        if (numcomments > linkcomments):
            linkurl = "http://friendfeed.com/e/" + entry["id"]
            linkcomments = numcomments
        for like in entry["likes"]:
            liker = like["user"]
            likers.add(liker["name"])

    totallikes = len(likers)

The friendfeed API returns entries in JSON format, which is parsed by their API and returned as nested Python lists. To count the number of likes and comments, one needs to iterate over each entry.

likers is a Python set, a datatype I learned about while working on this project. A set is a group of objects which will contain no duplicates. If you add an item to the set which is already present the set will contain only one instance of the item, not two. This is used to avoid overcounting likes: if the URL we are looking for was shared multiple times in friendfeed and the same user marked every one of them as liked, we only want to count that as one like not many.

The linkurl is a compromise. I'd really like to direct the link to a page containing all of the results for this URL. Unfortunately only the friendfeed JSON API includes URL search functionality, the web search page does not. So far as I can tell there is no way to link back to friendfeed for more than one entry ID. So here we link to the entry with the most comments.

    self.response.out.write("<FeedFlare>\n")
    if (totalshares == 0):
        self.response.out.write("  <Text>On Friendfeed: 0 shares</Text>\n")
    else:
        self.response.out.write("  <Text>On Friendfeed: " +                      \
                                self.fmtTotal(totalshares,   "Share")   + ", " + \
                                self.fmtTotal(totallikes,    "Like")    + ", " + \
                                self.fmtTotal(totalcomments, "Comment") +        \
                                "</Text>\n");
    self.response.out.write("  <Link href=\"" + linkurl + "\"/>\n");
    self.response.out.write("</FeedFlare>")
    return

Generate the XML output. self.fmtTotal() is another little helper routine to pluralize the output correctly, "1 Comment" versus "2 Comments" The result of all this processing is a simple bit of XML:

<FeedFlare>
  <Text>On Friendfeed: 5 Shares, 1 Like, 2 Comments</Text>
  <Link href="http://friendfeed.com/e/1b0141a1-f6fa-1be2-e775-e5d36959e04c"/>
</FeedFlare>

This is all feedburner needs to create the FeedFlare. All formatting, including the font size and the blue text coloring, is hard-coded by feedburner. The FeedFlare does not get to supply any formatting, just some text and an optional link.

  def fmtTotal(self, count, descr):
    suffix = "" if (count == 1) else "s"
    return str(count) + " " + descr + suffix

  def parseArg(self, args, argname):
    try:
        ret = args[argname][0]
    except:
        ret = None
    return ret

The aforementioned helper routines.

Thats it, or rather thats the interesting part. The complete source can be downloaded.

The next question is, what is missing? What does it not do, that perhaps it should?

There is no caching of the result. Every request for the FeedFlare results in another API request to friendfeed.com. I believe this is acceptable because FeedBurner limits the rate of FeedFlare requests to about one per two hours.
The link in the generated FeedFlare points to the friendfeed entry with the most comments. This is a compromise. I'd rather to link to a search results page with all of the entries regarding the given URL, but can not find a good way to do it. I'd have to make the FeedFlare dynamically construct a page populated with all of the links, showing all of the likes and comments... and that is too much work for this little project. I hope that someday, friendfeed.com will provide a way to supply multiple entry IDs to appear on a single page.

Using the FeedFlare

If you are interested in using this FeedFlare on your own blog, please feel free. You have a few options:

To use it without a specific nickname (so the results will include Everyone on friendfeed whether they follow you or not) you can use this link as the Flare Unit URL in the Feedburner -> Optimize -> FeedFlares page for your feed.
To configure it to only include people who subscribe to you on friendfeed, download http://feedflare.geekhold.com/feedflareunit/friendfeeduser.xml">friendfeeduser.xml. Replace MY_NICKNAME with your friendfeed account name, and MY_API_KEY with your Remote API Key, and put the modified file somewhere on your own site to be used to configure FeedBurner.
The functionality to restrict the results to your subscribers is not working right now. Please stay tuned. I'll post an update on friendfeed.com if I get it working again.
If you don't like something about the way this code works, you're free to modify it. You can download the source code, set up your own Google App Engine application, and modify it as desired.

Wednesday, April 1, 2009

The Target Market is Not Obvious

munchkin bottle warmer Exhibit A: a bottle warmer by munchkin, a company focussed mainly on baby products. You pour water into the base, put the bottle in the holder, and press the button. A heating element boils the water to steam, which warms the bottle.

Exhibit B: a piezoelectric buzzer. When the bottle is done heating, the lighted button on the front changes from red to green and the product emits a series of four shrill beeps to announce its successful completion. It is obviously very proud of itself.

The issue, and point of today's rant? When using the bottle warmer you are likely holding a squirming, ravenously hungry baby who really isn't interested in the details of the food preparation equipment. You don't need an audible alarm, as you'll be staring desperately at the warmer ready to snatch the bottle out the instant it has made it from "cold" to "tepid." So at best, the buzzer is annoying. With twins this feature is actively harmful: you really, really don't want to wake the second baby until the first is done eating. Trust me on this one.

PCB showing removed buzzer Exhibit C: the empty space on the PCB where the buzzer used to be.

I have to say, I found their user interface to disable the buzzer feature somewhat difficult: a Torx screwdriver and soldering iron.

Who was this thing designed for anyway, and why did they feel a buzzer was necessary? One can only speculate about the product requirements document and user story which led to this feature...

This is Evilyn, a new parent.

When the baby is hungry she puts a bottle in the warmer, turns on the vacuum cleaner to drown out the crying, and goes to watch a little TV. When the bottle is heated there needs to be an indication noticeable despite both the vacuum cleaner and Oprah.

I'm kidding of course, I don't believe there was ever any such product plan. I think the problem is somewhat more subtle: the bottle warmer wasn't really designed for the parent at all — and no, it wasn't designed for the baby either.

Winning the Business

The munchkin corporation might well design some of its products, but as with any large company they would face the "build versus buy" decision for each niche product filling out their lineup. Their main concern is marketing and brand management. So the bottle warmer was quite possibly designed and manufactured by another company, and that other company does not sell directly to parents. Though the product obviously needs to perform its function, their customer is not the end user of the bottle warmer. Their customer is the buyer at the munchkin corporation. They might be designing it under contract or, quite possibly, designing it speculatively and hoping it will be picked up by one or more baby product companies. They have to make a good pitch, or be the most impressive offering at the trade show, to get the buyer to pick them instead of some other supplier.

In that sort of competition, a longer feature list can be the key to winning the business. The munchkin bottle warmer has several extra features: it can sterilize pacifiers and bottle nipples using steam, it has a lighted button which changes from red to green during operation, and it has the aforementioned infernal buzzer.

So there you have it. When you are vaguely dissatisfied with something you bought, if it just doesn't seem to have been completely thought through, this might be the reason. The designer's target was not you. It was targeted to catch the attention of the buyer for another company. This is also a primary reason why Apple has been so successful in consumer electronics: though they might buy components and software from outside, they never pick up a complete product to slap their logo onto. They retain control of the product design.

Thursday, March 19, 2009

Exploring Google App Engine

I registered my domain in 1996, when Network Solutions was the only registrar and their domain registration forms were faxed in. Back then email and web service providers were rather expensive for a small private domain, so I set up the appropriate services on an OpenBSD system at my house. We had a cable modem from @Home, with a static IP address because DHCP was new and unproven and PPPoE did not exist. We bought DNS service from Illuminati Online, and pointed it at the @Home static IP address.

In 2009 running servers on a system in ones home is far less the entertaining pursuit than it was in 1996. Botnets launch constant automated sweeps looking for vulnerable machines, and the deluge of spam is ever-increasing. So I started looking for alternatives, and fortunately the intervening years have radically changed the ISP market. Lots of hosting providers are available for a small fee. However the trickle of traffic to our current web site comes entirely from family, friends, and botnets, and I'd prefer to avoid paying for such a small installation. Google Sites is free of charge, but does not allow subdirectories. We have a modest collection of pages built up over a decade, I'm not interested in redoing all of the links to flatten the hierarchy.

There is another free option now: Google App Engine. Intended to run Python applications, it does handle static files and allows subdirectories in its file hierarchy. I started with the most straightforward approach for serving entirely static files, relying on Charles Engelke's writeup. You register for Google App Engine, put your HTML files in a subdirectory (which I've named named "static"), and create an app.yaml like so:

application: my-static-webpages
version: 1
runtime: python
api_version: 1

- url: (.*)/
  static_files: static\1/index.html
  upload: static/index.html

- url: /.*
  static_dir: static

This works, albeit with drawbacks, the most serious being the handling of inexact links. There are external links to our pages which lack the trailing slash, pointing to "http://www.example.com/foo" where foo is a directory. The existing Apache server would kindly send a redirect to "http://www.example.com/foo/" but the App Engine static handler returns a 404 error. I want to be forgiving, and not break existing links. Charles Engelke wrote a followup article describing how to perform basic HTTP redirects, so I decided to tackle something similar. Rather than use the static handler, we'll use Python. We configure app.yaml to run the Python handler:

application: my-static-webpages
version: 1
runtime: python
api_version: 1

handlers:
- url: /.*
  script: forgivedirectories.py

Though I bought my first Python book many years ago, I've done relatively little work in the language. This was quite a learning experience.

forgivedirectories.py: from google.appengine.ext import webapp from google.appengine.ext.webapp.util import run_wsgi_app import os class ForgiveDirectories(webapp.RequestHandler): def get(self): fullPath = 'static/' + self.request.path if not os.path.exists(fullPath): self.redirect('/404.html') return	If it really is a bad link, we send them to an error page.
if os.path.isdir(fullPath): if fullPath[-1] != '/': self.redirect(self.request.path + '/') return else: fullPath += '/index.html'	This is the "forgive" part of forgivedirectories.py. If the requested path is a directory but does not end in a slash, send them a redirect to the proper path. Note that its important to send a redirect, and not just send the index.html page. Relative links like "../bar/index.html" will only work if the browser's path ends in a slash.
# if Google App Engine adds a # sendfile equivalent, we should # use it here. fh = open(fullPath, 'r') while 1: block = fh.read(16384) if not block: break self.response.out.write(block) fh.close	If we make it here, we have a page to send. To limit the memory footprint we loop over every 16 KBytes; slurping the entire file into memory is neither necessary nor beneficial.
application = webapp.WSGIApplication( [('.*', ForgiveDirectories)], debug=True) def main(): run_wsgi_app(application) if __name__ == "__main__": main()	Boilerplate code to initialize the web services gateway and call into our function. Its possible to match regular expressions to dispatch to multiple different classes, but here we match everything and send it to ForgiveDirectories.

This works, sortof. You can see the web pages now. One problem is that the Content-Type is not being set, so browsers have to assume a default. Amusingly enough most browsers default to text/html, so at least the pages render even if images are broken. Google Chrome defaults to text/plain, so it shows the HTML source of the pages. So we obviously need to set the Content-Type, which we will do by examining the file's extension. Being new to Python, it took a few tries to get something reasonable.

def contentTypeFromExt(self, extension): if extension=='.html': return 'text/html' elif extension=='.jpg': return 'image/jpeg' elif extension=='.gif': return 'image/gif' elif extension=='.png': return 'image/png' elif extension=='.js': return 'application/x-javascript' elif extension=='.css': return 'text/css' else: return 'text/plain'	The first attempt. Its basically C code, translated into Python.
def contentTypeFromExt(self, ext): contenttype = { ext == '.html': 'text/html', ext == '.jpg' : 'image/jpeg', ext == '.gif' : 'image/gif', ext == '.png' : 'image/png', ext == '.js' : 'application/x-javascript', ext == '.css' : 'text/css'}[1] return contenttype	The second attempt. Python does not have a switch statement, but some Google searching showed how to construct something that looks like one if you squint at it just right. Unfortunately this version doesn't supply 'text/plain' as a default, and its just... weird. Trying to torture the code to vaguely resemble a switch statement is not very pleasing.
contentExtensions = { '.html' : 'text/html', '.jpg' : 'image/jpeg', '.gif' : 'image/gif', '.png' : 'image/png', '.js' : 'application/x-javascript', '.css' : 'text/css'} def contentTypeFromExt(self, ext): try: return self.contentExtensions[ext] except KeyError: return 'text/plain'	The third attempt, and the first real effort to do it in a Pythonic way. Python has hash tables as a fundamental type in the language, so let's use them! If the key is not found in the dictionary an exception will be thrown... so I guess we're supposed to catch it? I dunno. Obviously, not finding the key in the dictionary is expected to be an unusual occurrence. There must be a better way to do it.
contentExtensions = { '.html' : 'text/html', '.jpg' : 'image/jpeg', '.gif' : 'image/gif', '.png' : 'image/png', '.js' : 'application/x-javascript', '.css' : 'text/css'} def contentTypeFromExt(self, ext): return self.contentExtensions.get(ext, \ 'text/plain')	The current version, which I'm pretty happy with. By calling the get method directly we can supply a default value to be returned, instead of having it throw an exception.

Now we need to actually set the Content-Type header. I'm using os.path.splitext() to pull out the file extension, which is probably wrong: on Windows I think that method expects backslashes, where a URL will always be forward slashes. It is fine when used with the App Engine (which is not hosted on Windows), so I'll rely on it until I can figure out a more appropriate method to use.

  def contentTypeFromPath(self, path):
    (basename, extension) = os.path.splitext(path)
    return self.contentTypeFromExt(extension)
  
  # The following goes in the get() method
  # immediately before we open the fullPath file:
  self.response.headers['Content-Type'] = self.contentTypeFromPath(fullPath);

Left To Be Implemented...

The App Engine is a lot of fun. Its a way to experiment with web services without expense for hosting. The forgivedirectories.py script is now in service,handling our web site. There are a few things left to do, which I hope to cover in future updates:

After reorganizing the web site some time ago, the old Apache server was configured to send 302 redirects for the old hierarchy. The new AppEngine handler should too, just in case there are any links remaining to the old URLs.
For efficiency we should allow the browser to cache the pages, by implementing Last-Modified-Since support. The ipsojobs blog has a writeup about this which I intend to leverage

Monday, March 9, 2009

When the Universe Conspires to Prevent Backups

You know you're having a bad day when your backup software requires 83 Terabytes of free space in order to proceed.

As it turns out, the filesystem on this disk needed repair. Disk Utility found an 82.95 terabyte file which appeared to be incorrect.