Friday, October 31, 2008

Ode to Enum

Let's just get this out in the open: I love enum. Enum is a friend of mine. Enum and I go way, way back.

I also detest enum's evil twin, the series of #defines.

typedef enum {
    OP_FOO,
    OP_BAR,
} operation_t;
#define OP_FOO   0
#define OP_BAR   1
GOOD BAD

 

Why this irrational hatred for poor #define? After all, it has had a rough life. It isn't even a real part of the C language, being completely a pre-processor construct. Allow me to explain how #define has wronged me, not just once but on multiple occasions.


When one has a series of #defines and one needs to add a new code point, the natural inclination is to grab the next unused value: Meanwhile on a different branch in the version control system, a colleague needs to add a new code point:
#define OP_FOO       0
#define OP_BAR       1
#define OP_BUCKAROO  2
#define OP_FOO       0
#define OP_BAR       1
#define OP_BANZAI    2

If you are lucky, this will be flagged as a merge conflict and someone will notice the duplicate values. If you are not lucky, the entries will have been added in slightly different locations such that there is no conflict... or perhaps the poor schmuck tasked to resolve the conflicts only cares about making the darn thing superficially compile so they can get back to their real work of pounding out new operation types.

#define OP_FOO       0
#define OP_BAR       1
#define OP_BUCKAROO  2
#define OP_BANZAI    2

... and much mayhem ensues.

Had we used enum instead, the bad outcome is prevented. The enumeration will assign unique values:

typedef enum {
    OP_FOO,
    OP_BAR,
    OP_BUCKAROO,
    OP_BANZAI,
} operation_t;

From this, Gentle Reader, we can conclude that version control systems must love enums.


 
gcc loves enums

gcc -Wswitch-enum provides a very nice benefit for enums and switch statements. The compiler will require that all enumerated values be handled in the switch, and will complain if they are not. Consider this code:

int test(operation_t op)
{
    switch(op) {
        case OP_FOO:
        case OP_BAR:
            return -2;
            break;
        default:
            return -3;
            break;
    }
 } /* test */

When new enumeration values are introduced, the compiler will flag the places where handling needs to be added:

./m.c:137: warning: enumeration value 'OP_BUCKAROO' not handled in switch
./m.c:138: warning: enumeration value 'OP_BANZAI' not handled in switch

Older gcc releases provided a weaker version of this checking in the form of -Wswitch, where a "default" case is considered to handle all enumerated values. So the presence of a default case renders -Wswitch useless. If you want to take advantage of -Wswitch but also want to practice defensive programming, unexpected values need to be handled outside of the switch:

int test(operation_t op)
{
    switch(op) {
        case OP_FOO:
        case OP_BAR:
            return -2;
            break;
    }

    return -3;
 } /* test */

The -Wall argument to gcc enables -Wswitch. If you are using a recent gcc, explicitly adding -Wswitch-enum is a good idea.


 
gdb loves enums

Enums make debugging easier, with symbolic values:

(gdb) print mydefine
$1 = 3
(gdb) print myenum
$2 = OP_BANZAI

Isn't that better?


 
enums on a diet

There is one minor disadvantage of using an enum: by default, an enum is the size of an int. So on a 32 bit machine an enum takes up 4 bytes of memory, even with only a small set of enumerations. However this bloat is easily eliminated, at least when using gcc:

typedef enum {
    OP_FOO,
    OP_BAR,
    OP_BUCKAROO,
    OP_BANZAI,
} __attribute__((packed)) operation_t;

A packed enum consumes the minimum footprint; in this example it is a single byte. If there are more than 256 enumerations, or if enumerations are explicitly assigned to a large absolute value, the footprint will grow to 2 or 4 bytes.


 
Do you love enums?

Enums have been very, very good to me.


 
Updates

In the comments, Rob asked:

all very good - but can you do in c as you do in GDB i.e. can you show me a function myReverseEnumFuction() which when called:
printf("enum string=%s", myReverseEnumFunction(2) );
would output "OP_BUCKAROO" ???

Unfortunately no, I do not think there is a good way to do this in the general case. gdb is able to print the symbolic names for enumerations by referencing debug sections in the binary. It is certainly possible to use the same debug sections to construct myReverseEnumFunction(), but there are two drawbacks:

  1. Production binaries are routinely stripped of debugging information, which would render myReverseEnumFunction() inoperable.
  2. There are a number of different (and incompatible) formats for debugging sections. Even within the Unix-ish world using the ELF binary format there is STABS and several variations of DWARF. myReverseEnumFunction() would not be very portable.

The DWARF format is an interesting topic, and I've put it on my list of things to research and write about. In the meantime I would like to mention the C pre-processor's stringify capability ("#(argument)"), which may come in handy when manually constructing a myReverseEnumFunction() function. The MAP() macro below uses stringify to populate a structure with the enumeration name.

#define MAP(x)  {x, #x}

struct enum_mapping {
    bb_t code;
    char *name;
} enum_map[] = {
    MAP(OP_BUCKAROO),
    MAP(OP_BANZAI)
};

char *myReverseEnumFuction(bb_t bb)
{
    int i, max = sizeof(enum_map) / sizeof(enum_map[0]);

    for (i = 0; i < max; i++) {
        struct enum_mapping *emap = &enum_map[i];
        if (emap->code == bb) {
            return emap->name;
        }
    }
}