This time, Gentle Reader, we'll delve into one of the sticky areas of the C language: pointer aliases. Let's dive directly into code:
void printval(uint32_t val32) { printf("0x%08x\n", val32); } int main() { uint32_t val32 = 0; uint16_t *p16 = (uint16_t *)&val32; p16[1] = 0xa1fa; printval(val32); }
We have a 32 bit integer on the stack, to which we also direct a pointer to a 16 bit integer. Writing to p[1] overwrites 16 bits of the word.
$ cc -O0 ./test.c $ ./a.out 0x0000a1fa
I can hear the groans already: not another little-versus-big endianness discussion. No, thats not it! It is true that the result would be different on a little endian machine, but the far more interesting discussion concerns what happens when we compile -O2:
$ cc -O2 ./test.c $ ./a.out 0x00000000
Why is the result different? The plot thickens...
As it is one of my favorite techniques, we'll look at the generated MIPS instructions to see why this happens.
00: | li v1,0xa1fa | load 0xa1fa into register v1 |
04: | move a0,zero | load 0 into a0 (the 1st function argument) |
08: | sh v1,26(sp) | store 0xa1fa to the stack, where val32 is stored |
0c: | lw t9,0(gp) | load address of printval |
10: | jalr t9 | jump to printval |
14: | nop |
The second instruction loads zero into register a0, which will be passed as the first argument to printval(). val32 at that moment is zero. We then store 0xa1fa to 16 bits on the stack, overwriting part of val32. The zero in the argument register is untouched, so we end up passing zero to printval() even though val32 on the stack has changed to 0x0000a1fa. Surely this is a bug in gcc's optimizer, right?
No it isn't a flaw in gcc, but understanding this behavior will take us back in the history of the C language.
Set Wayback Machine to 1974
C has a reputation as a language for high performance code. Yet the language is over 30 years old, during which time computer architecture has changed considerably. Instruction execution and memory access performance have both increased phenomenally, but execution speed has improved far more rapidly than memory. Accesses to memory have become almost two orders of magnitude more expensive relative to the CPU speed compared to 30 years ago, and a great deal of silicon is now spent on data caches to avoid the cost of going to memory.
One difficulty with optimizing C code is the free-wheeling use of pointers. Two pointers are called aliases if they point to the same memory location, and in general the compiler cannot really know whether any two arbitrary pointers might be aliases. To be completely safe, whenever a write is done to a pointer the compiler would have to forget any values fetched using other pointers. Those values would be re-fetched from memory, just in case they had been changed by the aliasing pointer.
Unfortunately keeping values in registers is a key factor in good performance. Even if the CPU data cache still holds all of the data, the additional instructions to reload values into registers can easily bloat an inner loop and measurably slow its performance. For a long time C compilers were required to handle aliasing of arbitrary pointers. To get good performance they would "cheat" in specific cases, retaining values in registers when it was pretty certain there was no aliasing. Sometimes the compiler would be wrong, and you'd end up with incorrect behavior for a specific (and probably convoluted) segment of code with obscure aliasing somewhere. The subtle, inconsistent behavior of compiled code was becoming a problem.
Set Wayback Machine to 1999
To try to resolve the ambiguous nature of this handling, the C99 specification clarifies how compilers are allowed to optimize with respect to pointer aliasing:
- void * can alias anything
- char * can alias anything
- any other two distinct pointer types are assumed not to be aliases
Two pointers of the same type must be treated as if they were aliases, reloading the value from memory whenever the other changes. Two pointers of different types are assumed not to alias: operations on a pointer will not invalidate the values of other pointers. In the vast majority of cases this assumption is true, and it helps the C compiler to generate faster code.
The problematic statement in our example code is:
uint16_t *p16 = (uint16_t *)&val32;
"&val32" is a pointer to uint32_t. p16 is a pointer to uint16_t. Because they are different types, they are treated as non-aliasing. The fact that they obviously are aliases due to the way they are assigned is not relevant. The point of specifying the aliasing rules in C99 was to get away from the hodgepodge of unclear assumptions and heuristics, to follow just one consistent set of rules. The pointers are different types, therefore they are assumed not to alias.
Solutions
Leaving aside the contrived example which kicked off the article, how does one resolve this problem? One possible solution is the use of void or char pointers, which the compiler must treat as a potential alias of any other pointer. For complex casts of one structure to another, passing a void pointer around may be worth considering. Unfortunately pervasive use of void sacrifices most of the type checks the compiler offers, so you can have argument bugs creep in which would otherwise have been easily caught.
Another, simpler solution is to have the compiler handle aliasing more leniently. Specifying C89 mode, for example, requires the compiler to operate by the earlier language rules where any pointer could potentially alias another. Also, gcc supports a -fno-strict-aliasing flag which disables only the aggressive alias optimizations without impacting the rest of the C99 handling.
A solution which does not work is passing the pointer around, for example if we passed a pointer to val32 to printval() in the example code. In theory this would require the called routine to fetch the value from memory (side-stepping any aliasing issues), but in practice the compiler could inline printval() and use the same registered value as caused the controversy in the first place. Even worse, the problem could suddenly appear years after the code is written when the compiler is updated or compilation flags are changed resulting in different inlining behavior.
In most cases where you have an existing codebase with aliasing problems, -fno-strict-aliasing is the best way to go. Aliasing bugs can be incredibly difficult to find. There is a slight performance hit to -fno-strict-aliasing in the form of additional references to memory, but in most cases it will be small as the data cache will satisfy all requests.
Update: In the comments, Lance talks about compiler warnings:
Also worth noting, compiling this bit of code with "-O2 -Wall" causes some versions of gcc to complain "warning: dereferencing type-punned pointer will break strict-aliasing rules", which can be useful when attempting to track down this type of problem. Interestingly, "-O0 -Wall" does not produce this warning because gcc only does the analysis required to detect this situation when optimizations requiring alias analysis are enabled.