Friday, December 19, 2008

printf-acular

Ahh, printf. What would we do without printf, sprintf, etc? It makes it easy to format integers, floats, strings, MAC addresses... Oh wait, there isn't a format specifier for MAC addresses is there? Well, lets rectify this obvious omission.

glibc (and uClibc) implement a facility to add new format codes to the printf family of functions. It is even possible to override the builtin implementations of the C standard formats, if the Gentle Reader has an irrational hatred for some aspect of the output. The GNU documentation for how to do this is adequate, but a bit obscure. So lets dive into an example by adding a %M specifier to handle MAC addresses:

uint8_t mac[6] = { 0x00, 0x11, 0x22, 0x33, 0x44, 0x55 };

if (register_printf_function ('M',
        printf_output_M, printf_arginfo_M)) {
    ... error handling ...
}

printf("%M\n", mac);

Output:
00:11:22:33:44:55

Huzzah! Now we'll dig into the implementation.


 
printf_arginfo_M

The first routine to be implemented tells glibc the arguments required by the new specifier:

#include <stdio.h>
#include <printf.h>

static int
printf_arginfo_M(const struct printf_info *info,
                 size_t      n,
                 int        *argtypes)
{
    /* "%M" always takes one argument, a pointer to uint8_t[6]. */
    if (n > 0) {
        argtypes[0] = PA_POINTER;
    }

    return 1;
} /* printf_arginfo_M */

The fields of the printf_info structure are described in the GNU manual. For the arginfo() implementation the only important field is spec, the character code of the specifier for which the information is sought. This allows a single arginfo() routine to implement support for multiple new specifiers.

glibc also passes in an array of argtypes[] with n members, which are to be filled in with the types of the arguments expected. The available types are documented in the GNU manual. The arginfo() routine must populate the argtypes[] with the expected arguments, after checking to ensure there is room. It should return the number of arguments it expects. If you define a format specifier which takes a large number of arguments, there might not be sufficient room in argtypes[] to store them all. If the arginfo() routine returns a larger value than n, glibc will call it again with a larger argtypes[] array.

For %M there is only one argument, the MAC address to be formatted. The ability to write a format specifier with more than one option is potentially useful, but also confusing as none of the standard format options do so.


 
printf_output_M

Next we need to implement the output routine, to generate the formatted output for the new specifier. Output always goes to a FILE *, no matter whether called from printf(), sprintf(), vfprintf(), etc.

One is free to use the stdio.h routines like fprintf(), so long as you do not use your new specifier in the format string. If you do, fprintf will call back into your output routine, which will call fprintf() again, which will call your output routine again... and hijinks ensue.

static int
printf_output_M(FILE *stream,
                const struct printf_info *info,
                const void *const *args)
{
    const unsigned char *mac;
    int len;

    mac = *(unsigned char **)(args[0]);

    len = fprintf(stream, "%02x:%02x:%02x:%02x:%02x:%02x",
                  mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

    return len;
} /* printf_output_M */

We only take one argument for %M, so we only look at args[0]. There is no need to check for NULL: glibc validates the format before calling your output routine, you'll never get a NULL pointer. We call fprintf() to do most of the work, and return the number of bytes output.

printf("%30M\n", mac);

00:11:22:33:44:55

Voila! I hope you've enjoyed this article... oh wait, that wasn't very good was it? The "30" prepended to the M specifier was meant to right-justify the output within a 30 character region. The output did not reflect this. Oops.


 
printf_output_M, with modifiers

glibc parses out the modifiers prepended to the specifier, and passes them to the output routine via the printf_info structure. Many of the standard modifiers don't make sense for MAC addresses. For example, I have no idea what "%llM" would mean. A few modifiers do make sense, such as a field width and the '-' argument to left-justify.

We certainly could write a bunch of code to insert the proper number of spaces before or after the MAC address, but we'll use a slightly different approach. We call fprintf() to generate the output anyway, so we'll generate a format string for fprintf() to make it handle the width and justification.

static int
printf_output_M(FILE *stream,
                const struct printf_info *info,
                const void *const *args)
{
    const unsigned char *mac;
    char macstr[18]; /* 6 hex bytes, with colon seperators and trailing NUL */
    char fmtstr[32];
    int len;

    mac = *(unsigned char **)(args[0]);
    snprintf(macstr, sizeof(macstr), "%02x:%02x:%02x:%02x:%02x:%02x",
            mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

    /* We handle some conversion specifier options as appropriate:
     * '-' and field width */
    if (info->width > 0) {
        snprintf(fmtstr, sizeof(fmtstr), "%%%s%ds",
                (info->left ? "-" : ""), info->width);
    } else {
        snprintf(fmtstr, sizeof(fmtstr), "%%s");
    }

    len = fprintf(stream, fmtstr, macstr);

    return len;
} /* printf_output_M */

We generate the formatted MAC address as a string buffer on the stack. The bolded code examines the modifiers in printf_info, and generates a format string in fmtstr[]. For example if the arguments to printf were ""%-30M", the fmtstr[] will be "%-30s"

Now the output is much more pleasing:

printf("%30M\n", mac);

               00:11:22:33:44:55

 
printf_output_M, Now with More Cisco

Which looks more natural to the Gentle Reader?

00:11:22:33:44:55

0011.2233.4455

If you chose the first one, you likely come from a desktop background with Operating systems such as Windows, MacOS, Linux, Solaris, or the various other Unixes which all use a colon-seperated MAC address.

If you chose the second one, you likely come from a Cisco networking background. Cisco uses a MAC address format which takes up three fewer columns on the screen, important if one's user interface is a text CLI.

Could we accommodate this variation? Certainly it could be done using another conversion specifier, perhaps using "%C" for a Cisco-style MAC address, but this seems inelegant. Coming to our rescue is a little used format modifier in printf: the 'alternate representation' hash mark. This modifier implements optional formatting, for example when used with the hex specifier as "%#x" it will prepend 0x to the output.

So we'll implement the Cisco MAC address format using "%#M" in printf_output_M, like so:

    if (info->alt) {
        /* Cisco style formatting */
        snprintf(macstr, sizeof(macstr), "%02x%02x.%02x%02x.%02x%02x",
                mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
    } else {
        /* Unix/Linux/Windows/MacOS style formatting */
        snprintf(macstr, sizeof(macstr), "%02x:%02x:%02x:%02x:%02x:%02x",
                mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
    }

If invoked with "%#M" we'll take the first code path and produce a Cisco-style MAC address. Otherwise, we generate a colon separated MAC address.


 
Whither -Wall?

With the -Wformat (or -Wall) argument, gcc will sanity check the arguments passed to the printf family of routines. It can ensure that the number of arguments matches the number of specifiers in the format string, for example. Unfortunately, it does not know about custom additions:

gcc -Wall testM.c
testM.c:57: warning: unknown conversion type character 'M' in format
testM.c:57: warning: too many arguments for format

I know of no way to teach gcc about the new specifiers, unfortunately. If the Gentle Reader knows of a way, please describe it in the comments.


 
GPL

I am not a lawyer, I do not play one on TV, and I do not want to play one on TV. So my opinions about GPL issues are really just quaint little notions of How The World Should Work.

An argument has been advanced in other contexts that if the only implementations of a particular API are licensed under the GPL, then any caller of that API should be considered a derivative work and be subjected to the GPL. As the register_printf_function() API only exists in glibc and uClibc, both of which are licensed under the LGPL, it could be argued that code calling register_printf_function() is therefore encumbered by the LGPL. Opinions may differ, but this is one possible interpretation.

Personally, I choose to treat custom printf formatters as LGPLd code.


 
Closing Thoughts

It is of course possible to implement routines to format a MAC address as a string, using "%s" in stdio.h routines for output. However I think it is more elegant to teach stdio.h how to deal with common output requirements for your application.

The C standard may claim additional lower case specifiers for standard formats in the future. It is best to rely on upper-case codes for custom formats.

Sunday, November 30, 2008

Blogroll, November 2008

The half-life of engineering knowledge is depressingly short. The old school way to stay current is via books covering technical topics, such as those published by O'Reilly and Associates and more recently by The Pragmatic Programmers. Though the quality of the writing in published books is generally quite good owing to the involvement of an editor, the other drawbacks are legion. Books have very long lead times before publication, leading to dated material. The enormous burdens placed upon the author mean that some interesting topic areas will never have a book developed. Technical books carry a relatively high price tag, and will continue to do so as long as the volumes stay low (a run of ten thousand is considered a successful technical title).

RSS I recent years I have sought my continuing education online, as presumably does the Gentle Reader as well. Sometimes this is in the form of publication-ready papers, such as those Ulrich Drepper is fond of writing (for example, "What Every Programmer Should Know About Memory" is excellent). I also check the links on Reddit regularly; I have mixed feelings about Reddit, but it does draw links to quite a bit of interesting material. If I find a particularly good author I'll add their RSS feed to Google Reader.

One blog I follow regularly is that of Louis Gray, a prolific blogger about social media and web 2.0. Each month he highlights other blogs in that space. I really like that idea: each link makes the entire genre just a little bit more richly connected. I'm going to try to regularly suggest other blogs which I follow, though I will probably not make these lists as frequently as Louis Gray does.

Some starting assertions:

  • I'm going to focus on blogs relevant to software development. For example I also read parenting blogs, and though the Gentle Reader might also be a parent you'll need to go elsewhere for that material.
  • I'm not going to spend time on Jeff Atwood, Steve Yegge, or other high profile writers. I suspect the Gentle Reader has already decided whether to follow those authors or not.
  • Though I try to focus on embedded system articles and many amongst this first set of links are related to low-level systems programming, future blogrolls will run further afield. You have been warned.

The Blogroll:
 
1) Proper Fixation, by Yossi Kreinin

Yossi writes from the perspective of a senior individual contributor, a point of view I greatly appreciate. He's covered close-to-the-metal topics like CPU architecture and debugging, but also other topics such as organizational challenges.


2) The Cranky Product Manager, by the Cranky PM

Written from the perspective of a Product Manager at Dysfunctosoft, a highly exaggerated (one hopes) parody of the CrankyPM's employer. Entertaining, with a dash of truthiness.


3) Dadhacker, by Landon Dyer

Landon Dyer is another senior developer with a long history in the computer industry. In the 1980s he worked on video games at Atari, and regularly posts reminiscences about those days. He also writes about the experience of being a developer in the lower levels of the system, like writing software for broken hardware and various pearls of wisdom.


4) Gustavo Duarte, by the eponymous Gustavo Duarte

Gustavo is best known for his extraordinarily detailed descriptions of various facets of the x86 architecture, such as the privilege levels and MMU hardware. He's also written on a wide range of topics such as the GPL, certifications, and software business philosophy.


More to come

That is it for this installment, but I'd like to make the blogroll a regular topic.

Friday, November 28, 2008

The Six Million Dollar LibC

Android Today, Gentle Reader, we will examine the Bionic library, a slim libc developed by Google for use in the Android mobile software platform. Bionic is clearly tailored for supporting the Android system, but it is interesting to see what might be done with it in other embedded system contexts.

Google's stated goals for Bionic include:

  1. BSD license: Android uses a Linux kernel, but they wanted to keep the GPL and LGPL out of user space.
  2. Small size: glibc is very large, and though uClibC is considerably smaller it is encumbered by the LGPL.
  3. Speed: designed for CPUs at relatively low clock frequencies, Bionic needs to be fast. In practice this seems to drive the decisions of what to leave out, rather than any special Google pixie dust to make code go fast.

In this article we'll delve into the Bionic libc via source inspection, retrieved from the git repository in October 2008. The library is written to support ARM CPUs, though some x86 support is also present. There is no support for other CPU architectures, which makes it a bit inconvenient as all of my current systems are PowerPC or MIPS. Nonetheless I'll concede that for the mobile phone market which Bionic targets, ARM is the only architecture which matters.

As one might expect for a BSD-licensed libc, a significant amount of code is sourced from OpenBSD, NetBSD, and FreeBSD. Additional BSD-licensed bits come from Sun and public domain code like the time zone package. There is also a significant amount of new code written by Google, particularly in the pthread implementation.


 
C++ support

So what is different about the Bionic libc versus glibc? The most striking differences are in the C++ support, as detailed in the CAVEATS file:

  • The Bionic libc routines do not handle C++ exceptions. They neither throw exceptions themselves, nor will they pass exceptions from a called function back through to their caller. So for example, if the cmp() routine passed to qsort() throws an exception the caller of qsort() will not see it.
     
    Support for C++ exceptions adds significant overhead to function calls, even just to pass thrown exceptions back to the caller. As Android's primary programming language is Java, which handles exceptions entirely within the runtime package, the designers chose to omit the lower level exception support. C++ code can still use exceptions internally, so long as they do not cross a libc routine. In practice, it would be difficult to actually guarantee that exceptions never try to transit a library routine.

  • There is no C++ Standard Template Library included. Developers are free supply their own, such as the free SGI implementation.

Lack of exceptions is obviously a big deal for C++ programmers, but nonetheless we'll push on.


 
libpthread

The pthread implementation appears to be completely new and developed by Google specifically for Android. It is, quite deliberately, not a complete implementation of POSIX pthreads. It implements those features necessary to support threads in the Dalvik JVM, and only selectively thereafter.

In other embedded Linux environments, the pthread library is crucial. There are a large number of developers in this space from a vxWorks background, to whom threads are simply the way software should be written. So we'll spend a bit more time delving into libpthread.

  • Mutexes, rwlocks, condvars, etc are all implemented using kernel futexes, which makes the user space implementation impressively simple. It seems a little too simple actually, I intend to spend a bit more time studying the implementation and Ulrich Drepper's futex whitepaper.

  • There is no pthread_cancel(). Threads can exit, but can not be killed by another thread.

  • There is no pthread_atfork(). This routine is useful if you're going to fork from a threaded process, allowing cleanups of resources which should not be held in the child. I've mostly seen pthread_atfork() used to deal with mutex locking issues, and need to study how the use of futexes affects fork().

  • Thread local storage is implemented, with up to 64 keys handled. Android reserves several of these for its own use: the per-thread id and errno, as well as two variables related to OpenGL whose function I do not understand. Interestingly the ARM implementation places the TLS map at the magic address 0xffff0ff0 in all processes. This technique is presumably part of the Google performance enhancing pixie dust.

  • POSIX realtime thread extensions like pthread_attr_{set,get}inheritsched and pthread_attr_{set,get}scope are not implemented. Frankly I've never worked on a system which did implement these APIs and am completely unfamiliar with them, so I don't find their omission surprising.

I haven't drawn a final conclusion of the Bionic pthread implementation yet. It is pleasingly simple, but lack of pthread_atfork() is troublesome and use of a magic address for the TLS map may make porting to other architectures more difficult. I need to get this puppy running on a PowerPC system and see how well it works.


 
Miscellaneous notes

In the course of digging through the library I generated a number of other notes, which don't really clump into categories. So I'm simply going to dump it all upon the Gentle Reader, in hopes that some of it is useful.

  • The README says there is no libm, though the source for libm is present with a large number of math routines. I need to investigate further whether it really works, or whether the README is out of date.

  • There is no wchar_t and no LOCALE support. I think this is fine: wchar_t is an idea whose time has come... and gone. The world has moved on to Unicode with its various fixed and variable width encodings, which the wide character type is not particularly useful for.
    I've used ICU in recent projects for internationalization support, and this is also what Google suggests in the README for Bionic.

  • There is a shared memory region of configuration properties. For example, DNS settings are stored in shared memory and not /etc/resolv.conf. The Android API also makes this shared memory configuration store available to applications via property_get() and property_set().

  • As one might expect, the stdio/stdlib/string/unistd implementation comes from OpenBSD, NetBSD, and FreeBSD with minimal changes. The only change I noticed was to remove the LOCALE support from strtod() (i.e., is the decimal point a period or a comma? In the Bionic library it is always a period).

  • There is no openlog() or syslog() implementation. There is a __libc_android_log_print() routine, to support Android's own logging mechanism.

  • Bionic uses Doug Lea's malloc, dlmalloc. Bionic also provides a hash table to track allocations looking for leaks, in malloc_leak.c.

  • There is no pty support that I can find, and no openpty(). There are reports of people starting an SSH daemon on a jailbroken Android device, so presumably there is some pseudo-terminal implementation which I've missed.

  • There are no asynchronous AIO routines like aio_read() or aio_write().

  • Bionic contains an MD5 and SHA1 implementation, but no crypt(). Android uses OpenSSL for any cryptographic needs.

  • Android dispenses with most file-based Unix administration. Bionic does not implement getfsent, because there is no /etc/fstab. Somewhat incongruously there is a /var/run/utmp, and so getutent() is implemented.

  • Android implements its own account management, and does not use /etc/passwd. There is no getpwent(), and getpwnam()/getpwuid() are implemented as wrappers around an Android ID service. At present, the Android ID service consists of 25 hard-coded accounts in <android_filesystem_config.h>

  • Bionic isn't finished. getprotobyname(), for example, will simply print "FIX ME! implement getprotobyname() __FILE__:__LINE__"

  • There is no termios support (good riddance).

 
Conclusion

Bionic is certainly interesting, and pleasingly small. It also represents a philosophical outlook of keeping the GPL some distance away from the application code.

Bionic is a BSD-based libc with support for Linux system calls and interfaces. If the lack of C++ exceptions or other limitations prove untenable, the syscall and pthread mutex implementation could be repurposed into the heavier FreeBSD/NetBSD/OpenBSD libc, though handling thread cancellation using the Bionic mutexes could require additional work.


 
Postscript

If you don't understand the reference in the title of this article, don't fret: you have simply not watched enough bad 1970's American television.

Update: In the comments, Ahmed Darwish points out another Android-related article discussing the kernel and power management interfaces Google added.

Update2: Embedded Alley is working on a MIPS port of the Android software.

Update3: In the comments Shuhrat Dehkanov points out an interview with David Turner, who works at Google on the Bionic implementation. Shuhrat also notes that you might have to log in to Google Groups to see the attachment. "Here is an overview by David Turner (though non-official) which answers some of the questions/unclear parts in your article."

Friday, October 31, 2008

Ode to Enum

Let's just get this out in the open: I love enum. Enum is a friend of mine. Enum and I go way, way back.

I also detest enum's evil twin, the series of #defines.

typedef enum {
    OP_FOO,
    OP_BAR,
} operation_t;
#define OP_FOO   0
#define OP_BAR   1
GOOD BAD

 

Why this irrational hatred for poor #define? After all, it has had a rough life. It isn't even a real part of the C language, being completely a pre-processor construct. Allow me to explain how #define has wronged me, not just once but on multiple occasions.


When one has a series of #defines and one needs to add a new code point, the natural inclination is to grab the next unused value: Meanwhile on a different branch in the version control system, a colleague needs to add a new code point:
#define OP_FOO       0
#define OP_BAR       1
#define OP_BUCKAROO  2
#define OP_FOO       0
#define OP_BAR       1
#define OP_BANZAI    2

If you are lucky, this will be flagged as a merge conflict and someone will notice the duplicate values. If you are not lucky, the entries will have been added in slightly different locations such that there is no conflict... or perhaps the poor schmuck tasked to resolve the conflicts only cares about making the darn thing superficially compile so they can get back to their real work of pounding out new operation types.

#define OP_FOO       0
#define OP_BAR       1
#define OP_BUCKAROO  2
#define OP_BANZAI    2

... and much mayhem ensues.

Had we used enum instead, the bad outcome is prevented. The enumeration will assign unique values:

typedef enum {
    OP_FOO,
    OP_BAR,
    OP_BUCKAROO,
    OP_BANZAI,
} operation_t;

From this, Gentle Reader, we can conclude that version control systems must love enums.


 
gcc loves enums

gcc -Wswitch-enum provides a very nice benefit for enums and switch statements. The compiler will require that all enumerated values be handled in the switch, and will complain if they are not. Consider this code:

int test(operation_t op)
{
    switch(op) {
        case OP_FOO:
        case OP_BAR:
            return -2;
            break;
        default:
            return -3;
            break;
    }
 } /* test */

When new enumeration values are introduced, the compiler will flag the places where handling needs to be added:

./m.c:137: warning: enumeration value 'OP_BUCKAROO' not handled in switch
./m.c:138: warning: enumeration value 'OP_BANZAI' not handled in switch

Older gcc releases provided a weaker version of this checking in the form of -Wswitch, where a "default" case is considered to handle all enumerated values. So the presence of a default case renders -Wswitch useless. If you want to take advantage of -Wswitch but also want to practice defensive programming, unexpected values need to be handled outside of the switch:

int test(operation_t op)
{
    switch(op) {
        case OP_FOO:
        case OP_BAR:
            return -2;
            break;
    }

    return -3;
 } /* test */

The -Wall argument to gcc enables -Wswitch. If you are using a recent gcc, explicitly adding -Wswitch-enum is a good idea.


 
gdb loves enums

Enums make debugging easier, with symbolic values:

(gdb) print mydefine
$1 = 3
(gdb) print myenum
$2 = OP_BANZAI

Isn't that better?


 
enums on a diet

There is one minor disadvantage of using an enum: by default, an enum is the size of an int. So on a 32 bit machine an enum takes up 4 bytes of memory, even with only a small set of enumerations. However this bloat is easily eliminated, at least when using gcc:

typedef enum {
    OP_FOO,
    OP_BAR,
    OP_BUCKAROO,
    OP_BANZAI,
} __attribute__((packed)) operation_t;

A packed enum consumes the minimum footprint; in this example it is a single byte. If there are more than 256 enumerations, or if enumerations are explicitly assigned to a large absolute value, the footprint will grow to 2 or 4 bytes.


 
Do you love enums?

Enums have been very, very good to me.


 
Updates

In the comments, Rob asked:

all very good - but can you do in c as you do in GDB i.e. can you show me a function myReverseEnumFuction() which when called:
printf("enum string=%s", myReverseEnumFunction(2) );
would output "OP_BUCKAROO" ???

Unfortunately no, I do not think there is a good way to do this in the general case. gdb is able to print the symbolic names for enumerations by referencing debug sections in the binary. It is certainly possible to use the same debug sections to construct myReverseEnumFunction(), but there are two drawbacks:

  1. Production binaries are routinely stripped of debugging information, which would render myReverseEnumFunction() inoperable.
  2. There are a number of different (and incompatible) formats for debugging sections. Even within the Unix-ish world using the ELF binary format there is STABS and several variations of DWARF. myReverseEnumFunction() would not be very portable.

The DWARF format is an interesting topic, and I've put it on my list of things to research and write about. In the meantime I would like to mention the C pre-processor's stringify capability ("#(argument)"), which may come in handy when manually constructing a myReverseEnumFunction() function. The MAP() macro below uses stringify to populate a structure with the enumeration name.

#define MAP(x)  {x, #x}

struct enum_mapping {
    bb_t code;
    char *name;
} enum_map[] = {
    MAP(OP_BUCKAROO),
    MAP(OP_BANZAI)
};

char *myReverseEnumFuction(bb_t bb)
{
    int i, max = sizeof(enum_map) / sizeof(enum_map[0]);

    for (i = 0; i < max; i++) {
        struct enum_mapping *emap = &enum_map[i];
        if (emap->code == bb) {
            return emap->name;
        }
    }
}

Tuesday, October 14, 2008

Aliasing By Any Other Name

This time, Gentle Reader, we'll delve into one of the sticky areas of the C language: pointer aliases. Let's dive directly into code:

void
printval(uint32_t val32)
{
    printf("0x%08x\n", val32);
}

int
main()
{
    uint32_t val32 = 0;
    uint16_t *p16 = (uint16_t *)&val32;

    p16[1] = 0xa1fa;

    printval(val32);
}

We have a 32 bit integer on the stack, to which we also direct a pointer to a 16 bit integer. Writing to p[1] overwrites 16 bits of the word.

$ cc -O0 ./test.c
$ ./a.out
0x0000a1fa

I can hear the groans already: not another little-versus-big endianness discussion. No, thats not it! It is true that the result would be different on a little endian machine, but the far more interesting discussion concerns what happens when we compile -O2:

$ cc -O2 ./test.c
$ ./a.out
0x00000000

Why is the result different? The plot thickens...

As it is one of my favorite techniques, we'll look at the generated MIPS instructions to see why this happens.

00: li   v1,0xa1fa load 0xa1fa into register v1
04: move a0,zero load 0 into a0 (the 1st function argument)
08: sh   v1,26(sp) store 0xa1fa to the stack, where val32 is stored
0c: lw   t9,0(gp) load address of printval
10: jalr t9 jump to printval
14: nop

The second instruction loads zero into register a0, which will be passed as the first argument to printval(). val32 at that moment is zero. We then store 0xa1fa to 16 bits on the stack, overwriting part of val32. The zero in the argument register is untouched, so we end up passing zero to printval() even though val32 on the stack has changed to 0x0000a1fa. Surely this is a bug in gcc's optimizer, right?

No it isn't a flaw in gcc, but understanding this behavior will take us back in the history of the C language.


 
Set Wayback Machine to 1974 DEC PDP11/20

C has a reputation as a language for high performance code. Yet the language is over 30 years old, during which time computer architecture has changed considerably. Instruction execution and memory access performance have both increased phenomenally, but execution speed has improved far more rapidly than memory. Accesses to memory have become almost two orders of magnitude more expensive relative to the CPU speed compared to 30 years ago, and a great deal of silicon is now spent on data caches to avoid the cost of going to memory.

One difficulty with optimizing C code is the free-wheeling use of pointers. Two pointers are called aliases if they point to the same memory location, and in general the compiler cannot really know whether any two arbitrary pointers might be aliases. To be completely safe, whenever a write is done to a pointer the compiler would have to forget any values fetched using other pointers. Those values would be re-fetched from memory, just in case they had been changed by the aliasing pointer.

Unfortunately keeping values in registers is a key factor in good performance. Even if the CPU data cache still holds all of the data, the additional instructions to reload values into registers can easily bloat an inner loop and measurably slow its performance. For a long time C compilers were required to handle aliasing of arbitrary pointers. To get good performance they would "cheat" in specific cases, retaining values in registers when it was pretty certain there was no aliasing. Sometimes the compiler would be wrong, and you'd end up with incorrect behavior for a specific (and probably convoluted) segment of code with obscure aliasing somewhere. The subtle, inconsistent behavior of compiled code was becoming a problem.


 
Set Wayback Machine to 1999

To try to resolve the ambiguous nature of this handling, the C99 specification clarifies how compilers are allowed to optimize with respect to pointer aliasing:

  • void * can alias anything
  • char * can alias anything
  • any other two distinct pointer types are assumed not to be aliases

Two pointers of the same type must be treated as if they were aliases, reloading the value from memory whenever the other changes. Two pointers of different types are assumed not to alias: operations on a pointer will not invalidate the values of other pointers. In the vast majority of cases this assumption is true, and it helps the C compiler to generate faster code.

The problematic statement in our example code is:

uint16_t *p16 = (uint16_t *)&val32;

"&val32" is a pointer to uint32_t. p16 is a pointer to uint16_t. Because they are different types, they are treated as non-aliasing. The fact that they obviously are aliases due to the way they are assigned is not relevant. The point of specifying the aliasing rules in C99 was to get away from the hodgepodge of unclear assumptions and heuristics, to follow just one consistent set of rules. The pointers are different types, therefore they are assumed not to alias.


 
Solutions

Leaving aside the contrived example which kicked off the article, how does one resolve this problem? One possible solution is the use of void or char pointers, which the compiler must treat as a potential alias of any other pointer. For complex casts of one structure to another, passing a void pointer around may be worth considering. Unfortunately pervasive use of void sacrifices most of the type checks the compiler offers, so you can have argument bugs creep in which would otherwise have been easily caught.

Another, simpler solution is to have the compiler handle aliasing more leniently. Specifying C89 mode, for example, requires the compiler to operate by the earlier language rules where any pointer could potentially alias another. Also, gcc supports a -fno-strict-aliasing flag which disables only the aggressive alias optimizations without impacting the rest of the C99 handling.

A solution which does not work is passing the pointer around, for example if we passed a pointer to val32 to printval() in the example code. In theory this would require the called routine to fetch the value from memory (side-stepping any aliasing issues), but in practice the compiler could inline printval() and use the same registered value as caused the controversy in the first place. Even worse, the problem could suddenly appear years after the code is written when the compiler is updated or compilation flags are changed resulting in different inlining behavior.

In most cases where you have an existing codebase with aliasing problems, -fno-strict-aliasing is the best way to go. Aliasing bugs can be incredibly difficult to find. There is a slight performance hit to -fno-strict-aliasing in the form of additional references to memory, but in most cases it will be small as the data cache will satisfy all requests.


 

Update: In the comments, Lance talks about compiler warnings:

Also worth noting, compiling this bit of code with "-O2 -Wall" causes some versions of gcc to complain "warning: dereferencing type-punned pointer will break strict-aliasing rules", which can be useful when attempting to track down this type of problem. Interestingly, "-O0 -Wall" does not produce this warning because gcc only does the analysis required to detect this situation when optimizations requiring alias analysis are enabled.

Thursday, September 11, 2008

Random Musings on stackoverflow.com

stackoverflow.com

Stackoverflow.com is a site by programmers and for programmers, whose purpose is answering questions on topics relating to software development. At the time of this writing it has completed a closed beta test and is in a more extensive test phase, ramping up to open on September 15, 2008. Access to the beta site is open to anyone who wants it during the test period, see blog.stackoverflow.com for details.

Stackoverflow is the brainchild of Jeff Atwood and Joel Spolsky, long-time bloggers on software development topics. It is described by its creators as the intersection of a Wiki, blog, forum, and digg/reddit voting site. Users post questions and answers about programming topics, and the questions and answers are voted on by the community. The highest voted answers rise to the top.


 
Incentives, aka crackoverflow.com upvoting

The site has a series of incentives to keep people coming back. The biggest incentive is the reputation score, which goes up and down depending on how other users rate your questions and answers. Reputation encourages asking questions and providing answers, in order to improve ones score.

reputation Reputation is in many ways similar to the Karma awarded by reddit.com and Hacker News. Capabilities on the site are unlocked by having a high enough reputation. For example voting on posts requires a score of 15 or more, where new users start with a score of 1.

The other, more unique rewards system on stackoverflow are the badges. Badges are directly modeled after the achievements awarded in Xbox games, which encourage gameplay by granting an award when a level is completed or other objectives achieved. The achievements show up in a gamers profile on Xbox Live, where they are used to talk smack about opponents.

badges The stackoverflow badges encourage exploration of the features of the site. The first few badges area easy to obtain: fill out your profile to obtain the Autobiographer badge, vote up to get the Supporter badge, and gain the Teacher badge by asking a question which is voted up. Later badges get harder, requiring larger numbers of users to vote on your submissions.

Are these mechanisms effective? I think the reputation score is, certainly. As in a video game, there is a certain thrill to seeing ones score go up. The badges seem, well, hokey to me. I'll grant that I'm probably not the target demographic anyway, badges might have more resonance with those who play games regularly.


 
Content question votes

The questions during the stackoverflow beta have ranged widely, but by far the most common questions are about Microsoft topics like .Net and SQL Server. This is likely a reflection of the audience of Jeff and Joel's respective blogs being heavily invested in these technologies.

I've been pleasantly surprised by a number of the questions posed on the site. Where the programming sub-reddit is now completely dominated by Ruby, Python, and Haskell, stackoverflow is covering a much wider mix of technologies. There have even been a few embedded system and assembly language questions.


 
Technical solutions to social problems

The history of web communities dedicated to software developers has not been pretty. The comment section of the programming sub-reddit is now so filled with vitriol and casual slander as to be unpleasant to read, let alone participate in. Hacker News attempts to prevent this slide into toxicity by a combination of social pressure and technical measures like not allowing downvotes until a certain karma has been achieved.

Stackoverflow is relying heavily on technical solutions to social problems. For example:

  1. Downvotes are only allowed by those at higher reputations, and reduce the target reputation by two where an upvote adds ten. Additionally downvoting costs the voter one point of reputation, to discourage frequent use.
  2. To discourage trivial edits to bump a question to the top of the list, more than five edits to a post make it "community owned" and no longer grant reputation to its poster.
  3. Users with sufficiently high reputation can edit another users posts or comments, for example to remove mean-spirited remarks. It is hoped they will use these powers for Good and not Evil.
  4. Users can mark a question or answer as offensive. This does not merely signal a moderator; if enough users click offensive the question will be automatically removed.

 
Speed bumps coming

I really like the premise behind stackoverflow, of a social media site designed specifically for programmers. I think there is definitely a need which reddit, DZone, and other developer-focussed sites do not fill.

Editing another users post

However I think this initial iteration of stackoverflow has a psychological problem: it tries to simultaneously be a voting site like reddit and digg, and a community site like Wikipedia. What you end up with is postings associated with a particular users name, and which provide benefits in the form of reputation, but which can be edited by other users. The site attempts to ameliorate this by making a post "community owned" after five edits, so it no longer belongs to the posting user. This step function solution brings the opposite problem: the poster feels cheated out of the reputation from what has become a popular topic.

This mismatch between the digg and wikipedia models is not working smoothly yet, in my opinion. The digg model thrives on ownership, the wikipedia model on anonymity.

Some other thoughts about future issues:

  • On other sites even replying to someone else's comment can elicit a defensive reaction. I suspect that editing another user's question or answer on stackoverflow will lead to a series of edit-revert-edit-revert struggles.
  • There is an enormous incentive to answer quickly: the early answers accrue upvotes and gain significant inertia to accrue more upvotes. A number of users answer immediately by quoting Wikipedia or the first Google result, and gain significant reputation by doing so. stackoverflow will have difficulty succeeding if the highest rated answers are mostly Wikipedia rehashes.
  • The site is far more active during US business hours, and relatively idle during nights and weekends. When combined with the strong incentive to answer immediately, the site can become a productivity drain. We may see companies block stackoverflow in the same way that Facebook is often blocked.

 
Conclusion

I really like stackoverflow, and I want it to succeed. I think its business model is sound: focus on a well-defined niche, provide real value, and carefully select advertising to target that specific audience.

UPDATE: This article was picked up on reddit, with amusing commentary. Also Sara Chipps and Dan Dyer have written about their initial impressions of stackoverflow.

Wednesday, September 3, 2008

The Good, Bad, and Ugly of Gizmo Construction

Rather by definition, embedded software involves building a gizmo of some sort. Manufacturing the hardware portion of the gizmo turns out to be somewhat more complicated than writing a Makefile and starting the build... who knew?

Today, Gentle Reader, we'll discuss the realities of building hardware products by meandering through a few topics:

  • Contract Manufacturing
  • Distributors
  • Contract Engineering
  • DVT and Compliance

 
Contract Manufacturing Printed Circuit Board

Contract manufacturers build a product according to a completed design including the Bill of Materials, the layout of the printed circuit board, assembly instructions, system tests, etc. They can do as much or as little as desired, from a single board up to a completed and tested system directly shipped to your customer.

In the last 25 years contract manufacturing has essentially taken over the production of all electronic goods. The economies of scale from producing such large volumes are overwhelming, and there are very few companies who maintain their own production facilities now. The capital expenses for kitting out an assembly line are daunting, the CM can amortize the cost over a huge number of products. Some well known contract manufacturing firms are Foxconn (also known as Hon Hai), Flextronics (which owns Solectron, another large CM), Celestica, and Sanmina-SCI.

Contract Manufacturers thrive on volume. If building something in really small volumes it is worth looking into production shops aimed at hobbyists, such ExpressPCB.

One thing to keep firmly in mind: the value of a customer to a CM is measured entirely by the volume of future business they expect. If order volumes drop off, the CM will rapidly lose interest. If your market suffers a serious downturn you may find that the quality of the product drops precipitously as the CM rushes through the build in order to get to another, more profitable customer. Similarly if you decide to terminate business with a CM and move to a competing firm, expect astonishingly bad build quality on the final order.


 
Distributors Fistful of Dollars

Like it or not, distributors are absolutely necessary in the embedded market. There are tens of thousands of customers buying millions of components. The component manufacturers simply cannot afford to maintain individual relationships with each customer when a distributor can represent multiple manufacturers with the same effort and expense.

Many Distributors provide FAEs (Field Application Engineers) to assist their customers in selecting components. A good FAE is extremely valuable, due to the sheer number of product designs they have been involved in. They will often be able to suggest alternatives which you might not otherwise have found, and know which parts have been problematic in other designs.

Distributors and Registration

Distributors who perform a significant amount of technical support during the design of a product cannot allow themselves to be undercut by a cheaper alternative in production. Therefore the component manufacturers allow distributors to register as being responsible for the design win at a particular customer. Only that distributor will be allowed to offer advantageous pricing, other distributors will not be allowed to undercut them.

Distributors do a lot more business with the big Contract Manufacturers than any individual customer, and CMs value their relationship with the distributor more than any individual customer. It is not unknown for a Distie to legitimately claim the design win for one particular component, then obtain the Bill of Materials from the CM and register themselves for every chip in the design. Choosing a Distributor is much like getting married: be very certain that the relationship will work in the long term before signing on the dotted line.


 
Contract Engineering Schematic

For a niche market with well-defined needs, a product can sell for years with minimal changes. In these cases its better to contract out the design than recruit a team. The contract will include the functional spec to be designed, timelines for completion, etc. The engineering firm will supply the requisite hardware, software, and firmware expertise, and the resulting design will belong to the customer.

Alternately, Contract Engineering firms can fill in gaps for an in-house design team. For example after components have been chosen and the design competed, a layout for the printed circuit board must be created. A good layout of a complex PCB requires an experienced designer and expensive CAD tools. It makes no sense to keep such a person on staff if only a few designs are done each year, so it is often contracted out. Mechanical design of the chassis and other sheet metal is also often done outside, for the same reason.

Contracts in this area are essentially always on a time and materials basis. The upfront estimate of total cost is not guaranteed. Fixed price contracts are exceedingly rare, because they represent an enormous risk to the engineering firm if the design time goes over estimates. In most cases this will work out fine: the engineering firm will want to get the design done quickly in order to move on to the next customer. However if business slows, watch out for unjustified padding of billable hours.


 
Design Verification Test Eye pattern

Not every unit coming out of the factory will be identical. Each component in the design has a tolerance, an allowed deviation which is still considered within specification. Every system coming off the line will contain parts slightly above spec or below, and it is important to insure that the complete system will still function reliably even if it contains components at the extreme ends of the tolerance. Though the hardware design takes these tolerances into account, in the real world it is difficult to anticipate every possible interaction of components and PCBs.

This is where DVT, for Design Verification test, comes in. During DVT the hardware engineers measure the system to ensure it not only does what it is supposed to do, but does so with sufficient margin to handle variances in its components. DVT is time-consuming work, and changes are often made in the production system based on what is found in the prototypes.


 
Compliance Underwriters Laboratories

The ubiquitous Underwriters Laboratories logo is likely the most widely known example of a compliance certification, but it is not the only one. There are a huge number of standards for product safety or performance, some recognized around the globe and others specific to each geographic market.

Some industries place much more stringent requirements on their products than others. For example, the NEBS Level 3 guidelines for the telecommunications market specifies that there be no noxious chemicals released if the equipment catches fire.


 
Conclusions

One of the aims of this blog is to provide insight into fields of software development which don't get as much exposure as Ruby on Rails or web frameworks. I've tried to provide an overview of some of the gritty details of building products. Its really quite exciting when the first prototype of a new system comes back from the factory, and you try to boot the software for the first time. When it doesn't boot, reality hits.

I'd like to thank John Walsh for much feedback and good suggestions of topics to cover in this posting.

Thursday, August 21, 2008

[0123456789abcdef]

There are only 16 characters to work with, but programmers just love coming up with creative spellings using hex numbers. I suspect that leetspeak evolved out of these spellings, though personally I prefer the original form.

I assume that everyone working as a software developer in English will have seen 0xdeadbeef, and probably some other favorites as well.

deadbeefold standby #1
feedf00dold standby #2
feedfaceold standby #3
decafbaddevelopers love to complain about coffee
badcafedevelopers really love to complain about coffee
badc0ffeedevelopers really, really love to complain about coffee
badc0c0aMacOS X developers might find more meaning in this one.
c0c0abadPeople who hate MacOS X developers might find more meaning in this one.

A little sed scripting on /usr/share/dist/words can turn up a lot of interesting combinations. For the edification and bemusement of the Gentle Reader, allow me to present a few of them here. I rejected most of the results where '5' replaced an 'S' as being too ugly, but a few passed muster.

cat /usr/share/dict/words | sed \
    -e "s/nine/9/g" -e "s/eight/8/g" -e "s/seven/7/g" -e "s/six/6/g"   \
    -e "s/five/5/g" -e "s/four/4/g" -e "s/three/3/g" -e "s/two/2/g"    \
    -e "s/one/1/g"  -e "s/zero/0/g"                                    \
    -e "s/ated/8ed/g"  -e "s/[oO]/0/g" -e "s/[lL]/1/g" -e "s/[sS]/5/g" \
    | egrep -v "[^0123456789aAbBcCdDeEfF]"

The first few seem particularly suitable for memory fenceposts, either guard words before and after allocations or patterns to scribble over freed memory when looking for use-after-free bugs.

a110c8edThis memory is in use, buster!
5eef3712This is ~(0xa110c8ed). No, it doesn't spell anything nifty.
dea110c8Scribble over memory after free(), to catch dangling references.
defec8edto crap all over memory
defacedanother bit pattern to scribble over memory to catch use-after-free errors

To express one's true feelings about the quality of the code base there are really only two options:

  1. Profanity-laden comment blocks
  2. Clever use of constants
c0defadedIt is a well known fact that old code suffers bit rot. Refactor often!
badfacadeThere are times when bad code can only be papered over. This is one of those times.
effaceGood code doesn't make a spectacle of itself.
defaceBad code, on the other hand, gets drunk at its best friends wedding and hits on the bride.
decadeThis code base has been under development for a long time.
baddeedThe EULA for this product specifies the precise amount of bad karma accumulated by using it.
accededThe software has finally given in.
befa11As in "what has befallen yon dead process?"
c0dedbadself explanatory

Magic numbers are useful in all sorts of situations. Encoding one's birthday (0xMMDDYYYY) is clever, but obscure. Subtle jokes in hex also work well.

abbacadabbaUnfortunately 44 bits won't magically fit into a uint32_t.
abadabba
dabbadabba
abbadabbadabba
Said the monkey to the chimp. Real magic numbers are 128 bit.
d00beeDebugging probably qualifies as "medicinal purposes."
d0dec0deHow does one pronounce ioctl anyway? "eye oh cottle," or "eye oct all ?"
babe2bedThe kid's bedtime is 7pm sharp.
b0cceba11You know, I only discovered Bocce Ball in my 30s.
5ca1ab1eIgnore what you see elsewhere, the secret to scalability is in using good magic numbers.
0x1deWith the leading 0x it sortof looks like "oxide" ... I admit it, this one sucks.

Why should return codes be boring? {0, 1, 2, ...} is so dull. We can do better.

cab0byummy
fa1afe1even more yummy!
b1abbedprobably I/O related
beddedThe code went to sleep?
b0bbedThey call it "floating point" for a reason, bub.
beadedUm, yeah. I can't think of anything funny about this one.
bab00My sweet baboo!
10adedI bet it has an itchy trigger finger, too.
ba11adI structure all my code to iambic pentameter.
a100fMy code doesn't like me.
acc01adeProgrammers rarely, if ever, hear praise of their work.
affab1eRelatively approachable and friendly code, I guess.
babb1eWhy yes, my functions tend to be a bit on the longish side. Why?
baff1eWhy yes, my functions tend to be a bit on the complex side. Why?
babe1You can write FORTRAN in any language.
ba1b0aIts the Eye of the Tiger, baby!
ed1f1celarge, imposing blocks of code
5ecededThis module has declared its independence from the rest of the system.
5c01dedAt times, it is necessary to be stern with the codebase. Give it a time out.
5caff01dThis code was intended to be temporary. That was four years ago. Such is the way of things.
ad0beI bet they use this one in Photoshop.
ab0demy humble abode
d8edbabeIn college, yeah, sure you did.
0ddba11That is a strange one, alright.

Finally, here are some 16 bit numbers which are more interesting than "dead," "f00d" and "beef"

cacaA statement about software quality, I suppose.
deafWhen programming, no-one can hear you scream.
c0edIt is the 21st century, after all.
ba1dIf tires can go bald, why not programs?
a1faI couldn't find a reasonable approximation of beta.
f01dOrigami programming!
fa11If code falls in the forest, does it make a sound?
c01dSoftware is a dish best served cold.
ab1eOr ! 0xab1e, as the case may be.
cedeI give up, I'm done.

Do you have any additional hex numbers to share? The comments section is open for business.

Update: Lisa Simone wrote an article about teaching embedded systems and the use of hex words in an article on her site.