Friday, October 31, 2008

Ode to Enum

Let's just get this out in the open: I love enum. Enum is a friend of mine. Enum and I go way, way back.

I also detest enum's evil twin, the series of #defines.

typedef enum {
    OP_FOO,
    OP_BAR,
} operation_t;
#define OP_FOO   0
#define OP_BAR   1
GOOD BAD

 

Why this irrational hatred for poor #define? After all, it has had a rough life. It isn't even a real part of the C language, being completely a pre-processor construct. Allow me to explain how #define has wronged me, not just once but on multiple occasions.


When one has a series of #defines and one needs to add a new code point, the natural inclination is to grab the next unused value: Meanwhile on a different branch in the version control system, a colleague needs to add a new code point:
#define OP_FOO       0
#define OP_BAR       1
#define OP_BUCKAROO  2
#define OP_FOO       0
#define OP_BAR       1
#define OP_BANZAI    2

If you are lucky, this will be flagged as a merge conflict and someone will notice the duplicate values. If you are not lucky, the entries will have been added in slightly different locations such that there is no conflict... or perhaps the poor schmuck tasked to resolve the conflicts only cares about making the darn thing superficially compile so they can get back to their real work of pounding out new operation types.

#define OP_FOO       0
#define OP_BAR       1
#define OP_BUCKAROO  2
#define OP_BANZAI    2

... and much mayhem ensues.

Had we used enum instead, the bad outcome is prevented. The enumeration will assign unique values:

typedef enum {
    OP_FOO,
    OP_BAR,
    OP_BUCKAROO,
    OP_BANZAI,
} operation_t;

From this, Gentle Reader, we can conclude that version control systems must love enums.


 
gcc loves enums

gcc -Wswitch-enum provides a very nice benefit for enums and switch statements. The compiler will require that all enumerated values be handled in the switch, and will complain if they are not. Consider this code:

int test(operation_t op)
{
    switch(op) {
        case OP_FOO:
        case OP_BAR:
            return -2;
            break;
        default:
            return -3;
            break;
    }
 } /* test */

When new enumeration values are introduced, the compiler will flag the places where handling needs to be added:

./m.c:137: warning: enumeration value 'OP_BUCKAROO' not handled in switch
./m.c:138: warning: enumeration value 'OP_BANZAI' not handled in switch

Older gcc releases provided a weaker version of this checking in the form of -Wswitch, where a "default" case is considered to handle all enumerated values. So the presence of a default case renders -Wswitch useless. If you want to take advantage of -Wswitch but also want to practice defensive programming, unexpected values need to be handled outside of the switch:

int test(operation_t op)
{
    switch(op) {
        case OP_FOO:
        case OP_BAR:
            return -2;
            break;
    }

    return -3;
 } /* test */

The -Wall argument to gcc enables -Wswitch. If you are using a recent gcc, explicitly adding -Wswitch-enum is a good idea.


 
gdb loves enums

Enums make debugging easier, with symbolic values:

(gdb) print mydefine
$1 = 3
(gdb) print myenum
$2 = OP_BANZAI

Isn't that better?


 
enums on a diet

There is one minor disadvantage of using an enum: by default, an enum is the size of an int. So on a 32 bit machine an enum takes up 4 bytes of memory, even with only a small set of enumerations. However this bloat is easily eliminated, at least when using gcc:

typedef enum {
    OP_FOO,
    OP_BAR,
    OP_BUCKAROO,
    OP_BANZAI,
} __attribute__((packed)) operation_t;

A packed enum consumes the minimum footprint; in this example it is a single byte. If there are more than 256 enumerations, or if enumerations are explicitly assigned to a large absolute value, the footprint will grow to 2 or 4 bytes.


 
Do you love enums?

Enums have been very, very good to me.


 
Updates

In the comments, Rob asked:

all very good - but can you do in c as you do in GDB i.e. can you show me a function myReverseEnumFuction() which when called:
printf("enum string=%s", myReverseEnumFunction(2) );
would output "OP_BUCKAROO" ???

Unfortunately no, I do not think there is a good way to do this in the general case. gdb is able to print the symbolic names for enumerations by referencing debug sections in the binary. It is certainly possible to use the same debug sections to construct myReverseEnumFunction(), but there are two drawbacks:

  1. Production binaries are routinely stripped of debugging information, which would render myReverseEnumFunction() inoperable.
  2. There are a number of different (and incompatible) formats for debugging sections. Even within the Unix-ish world using the ELF binary format there is STABS and several variations of DWARF. myReverseEnumFunction() would not be very portable.

The DWARF format is an interesting topic, and I've put it on my list of things to research and write about. In the meantime I would like to mention the C pre-processor's stringify capability ("#(argument)"), which may come in handy when manually constructing a myReverseEnumFunction() function. The MAP() macro below uses stringify to populate a structure with the enumeration name.

#define MAP(x)  {x, #x}

struct enum_mapping {
    bb_t code;
    char *name;
} enum_map[] = {
    MAP(OP_BUCKAROO),
    MAP(OP_BANZAI)
};

char *myReverseEnumFuction(bb_t bb)
{
    int i, max = sizeof(enum_map) / sizeof(enum_map[0]);

    for (i = 0; i < max; i++) {
        struct enum_mapping *emap = &enum_map[i];
        if (emap->code == bb) {
            return emap->name;
        }
    }
}

Tuesday, October 14, 2008

Aliasing By Any Other Name

This time, Gentle Reader, we'll delve into one of the sticky areas of the C language: pointer aliases. Let's dive directly into code:

void
printval(uint32_t val32)
{
    printf("0x%08x\n", val32);
}

int
main()
{
    uint32_t val32 = 0;
    uint16_t *p16 = (uint16_t *)&val32;

    p16[1] = 0xa1fa;

    printval(val32);
}

We have a 32 bit integer on the stack, to which we also direct a pointer to a 16 bit integer. Writing to p[1] overwrites 16 bits of the word.

$ cc -O0 ./test.c
$ ./a.out
0x0000a1fa

I can hear the groans already: not another little-versus-big endianness discussion. No, thats not it! It is true that the result would be different on a little endian machine, but the far more interesting discussion concerns what happens when we compile -O2:

$ cc -O2 ./test.c
$ ./a.out
0x00000000

Why is the result different? The plot thickens...

As it is one of my favorite techniques, we'll look at the generated MIPS instructions to see why this happens.

00: li   v1,0xa1fa load 0xa1fa into register v1
04: move a0,zero load 0 into a0 (the 1st function argument)
08: sh   v1,26(sp) store 0xa1fa to the stack, where val32 is stored
0c: lw   t9,0(gp) load address of printval
10: jalr t9 jump to printval
14: nop

The second instruction loads zero into register a0, which will be passed as the first argument to printval(). val32 at that moment is zero. We then store 0xa1fa to 16 bits on the stack, overwriting part of val32. The zero in the argument register is untouched, so we end up passing zero to printval() even though val32 on the stack has changed to 0x0000a1fa. Surely this is a bug in gcc's optimizer, right?

No it isn't a flaw in gcc, but understanding this behavior will take us back in the history of the C language.


 
Set Wayback Machine to 1974 DEC PDP11/20

C has a reputation as a language for high performance code. Yet the language is over 30 years old, during which time computer architecture has changed considerably. Instruction execution and memory access performance have both increased phenomenally, but execution speed has improved far more rapidly than memory. Accesses to memory have become almost two orders of magnitude more expensive relative to the CPU speed compared to 30 years ago, and a great deal of silicon is now spent on data caches to avoid the cost of going to memory.

One difficulty with optimizing C code is the free-wheeling use of pointers. Two pointers are called aliases if they point to the same memory location, and in general the compiler cannot really know whether any two arbitrary pointers might be aliases. To be completely safe, whenever a write is done to a pointer the compiler would have to forget any values fetched using other pointers. Those values would be re-fetched from memory, just in case they had been changed by the aliasing pointer.

Unfortunately keeping values in registers is a key factor in good performance. Even if the CPU data cache still holds all of the data, the additional instructions to reload values into registers can easily bloat an inner loop and measurably slow its performance. For a long time C compilers were required to handle aliasing of arbitrary pointers. To get good performance they would "cheat" in specific cases, retaining values in registers when it was pretty certain there was no aliasing. Sometimes the compiler would be wrong, and you'd end up with incorrect behavior for a specific (and probably convoluted) segment of code with obscure aliasing somewhere. The subtle, inconsistent behavior of compiled code was becoming a problem.


 
Set Wayback Machine to 1999

To try to resolve the ambiguous nature of this handling, the C99 specification clarifies how compilers are allowed to optimize with respect to pointer aliasing:

  • void * can alias anything
  • char * can alias anything
  • any other two distinct pointer types are assumed not to be aliases

Two pointers of the same type must be treated as if they were aliases, reloading the value from memory whenever the other changes. Two pointers of different types are assumed not to alias: operations on a pointer will not invalidate the values of other pointers. In the vast majority of cases this assumption is true, and it helps the C compiler to generate faster code.

The problematic statement in our example code is:

uint16_t *p16 = (uint16_t *)&val32;

"&val32" is a pointer to uint32_t. p16 is a pointer to uint16_t. Because they are different types, they are treated as non-aliasing. The fact that they obviously are aliases due to the way they are assigned is not relevant. The point of specifying the aliasing rules in C99 was to get away from the hodgepodge of unclear assumptions and heuristics, to follow just one consistent set of rules. The pointers are different types, therefore they are assumed not to alias.


 
Solutions

Leaving aside the contrived example which kicked off the article, how does one resolve this problem? One possible solution is the use of void or char pointers, which the compiler must treat as a potential alias of any other pointer. For complex casts of one structure to another, passing a void pointer around may be worth considering. Unfortunately pervasive use of void sacrifices most of the type checks the compiler offers, so you can have argument bugs creep in which would otherwise have been easily caught.

Another, simpler solution is to have the compiler handle aliasing more leniently. Specifying C89 mode, for example, requires the compiler to operate by the earlier language rules where any pointer could potentially alias another. Also, gcc supports a -fno-strict-aliasing flag which disables only the aggressive alias optimizations without impacting the rest of the C99 handling.

A solution which does not work is passing the pointer around, for example if we passed a pointer to val32 to printval() in the example code. In theory this would require the called routine to fetch the value from memory (side-stepping any aliasing issues), but in practice the compiler could inline printval() and use the same registered value as caused the controversy in the first place. Even worse, the problem could suddenly appear years after the code is written when the compiler is updated or compilation flags are changed resulting in different inlining behavior.

In most cases where you have an existing codebase with aliasing problems, -fno-strict-aliasing is the best way to go. Aliasing bugs can be incredibly difficult to find. There is a slight performance hit to -fno-strict-aliasing in the form of additional references to memory, but in most cases it will be small as the data cache will satisfy all requests.


 

Update: In the comments, Lance talks about compiler warnings:

Also worth noting, compiling this bit of code with "-O2 -Wall" causes some versions of gcc to complain "warning: dereferencing type-punned pointer will break strict-aliasing rules", which can be useful when attempting to track down this type of problem. Interestingly, "-O0 -Wall" does not produce this warning because gcc only does the analysis required to detect this situation when optimizations requiring alias analysis are enabled.