A google search for "premature optimization" turns up tens of thousands of hits, most of them quoting Hoare's maxim that premature optimization is the root of all evil.
However I would argue there is one optimization which, if it is to be done at all, should be enabled at the very start of the project. It is the most simple optimization imaginable: invoke the C compiler with optimization enabled. It might seem obvious to do so, but I've worked on more than one project which made the deliberate decision to start with -O0 to ease debugging and speed development. The intention is always to switch on optimization when the project is approaching release, but it rarely works out that way.
To be certain, there are sound reasons to use -O0 when developing the system. Optimized code is harder to debug: the line numbers in the debugger will not always be correct, variables you want to examine may have been optimized away, etc. However, it turns out to be exceedingly difficult to turn on compiler optimization for a software project which has been under development for a while. Code compiled -O2 has a number of differences compared to -O0, and there are several classes of bugs which will never be noticed until the day compiler optimization is enabled. I'll detail several of the more common such problems I've encountered.
Optimization Tidbit #1: Stack space
char buf[16]; int a, b, c; a = 0; b = a + 1; c = b + 2; ... a and b never used again...
Stack frames will be larger when compiled -O0. All declared variables will be allocated stack space in which to store them. When compiled -O2 the compiler looks for opportunities to keep intermediate values in registers, and if a value never needs to be stored to the stack then no space will be allocated for it.
What impact does this have? Because a good portion of the stack frame consists of "useless" space in the unoptimized build, a stack-related bug in the code is far less likely to be noticed as it will not cause any visible failure. For example:
char buf[16];
int a, b, c;
strncpy(buf, some_string, 20);
Because the optimized stack frame only contains space for things which are actually used, corruption of the stack is almost guaranteed to cause a failure. I'm not arguing this is a bad thing: walking off the end of a buffer on the stack is an extraordinarily serious coding problem which should be fixed immediately. I'm arguing a somewhat more subtle point: enabling optimization late in the development process will suddenly expose bugs which have been present all along, unnoticed because they caused no visible failures.
Optimization Tidbit #2: Stack initialization
When compiled -O0, all automatic variables are automatically initialized to zero. This does not come for free: the compiler emits instructions at the start of each function to laboriously zero it out. When compiled -O2 the stack is left uninitialized for performance reasons, and will contain whatever garbage happens to be there.
I wrote this article based on notes from debugging a large software product at a previous employer, where we transitioned to -O2 after shipping -O0 for several years. One of the routines I disassembled in that debugging effort contained a series of store word instructions, and I jumped to the conclusion that -O0 was deliberately zeroing the stack. I've had that in my head for several years now. However as has been pointed out in the comments of this article, my conclusion was incorrect. The instructions I was poring over at that time must have been for something else; I'll never know exactly what.
Part of the reason for starting this blog was to try to learn more about my craft. I freely admit that the things I don't know can (and do) fill many college textbooks. I'll thank Wei Hu for gently setting me straight on this topic, in the comments for this article. I've deleted the rest of the incorrect details of this Tidbit #2; I'll leave the first paragraph as a reminder.
Example
I worked on a product which had fallen into this trap, and had shipped to customers for a year and a half (through three point releases) still compiled -O0. It became increasingly ridiculous to try to improve performance weaknesses when we weren't even using compiler optimization, so we drew the proverbial line in the sand to make it happen. The most difficult problem to track down was in a module which, sometimes, would simply fail all tests.
int start_module() {
int enable;
... other init code ...
if (enable) activate_module();
}
It had a variable on the stack of whether to enable itself. This variable was not explicitly initialized, but when compiled -O0 one of the routines the main function called was walking off the end of its stack and scribbling a constant, non-zero value over enable. Thus the module would enable itself, quite reliably.
When compiled -O2, that other routine ended up scribbling on some something else and leaving enable alone. Thus, whether the module enabled itself depended entirely on the sequence of operations leading up to its activation and what precise garbage was on the stack. Most of the time the garbage would be non-zero, but occasionally just due to traffic pattern and random chance we'd be left with enable=0 and the module would fail all tests.
The real problem in this example is the routine which scribbled off its stack frame, but the effect of that bug was to make new regressions appear when -O2 was enabled. The longer a code base is developed, the more difficult it is to switch on optimization.
What does it mean?
Planning to do most of the development with -O0 and switch on optimization before release results in suddenly exposing bugs hidden throughout the code base, even in portions of the code which were considered stable and fully debugged. This will result in significant regression, and in the crunch time approaching a release the most likely response will be to turn optimization back off and plan to deal with it "in the next release." Unfortunately the next release will also include new feature development, fixing customer escalations, etc, and going back to turn on optimization may have to wait even longer. The more code is written, the harder it will be.
My recommendation is to do exactly the opposite: you should enable -O2 (or -Os) from the very start of the project. If you have trouble debugging a particular problem see if it can be reproduced on a test build compiled -O0, and debug it there. Unoptimized builds should be the exception, not the norm.

-
Anonymous
said...
-
-
April 2, 2008 12:00 AM
-
Filipe
said...
-
-
April 2, 2008 1:23 AM
-
Pranav
said...
-
-
April 2, 2008 5:30 AM
-
Anonymous
said...
-
-
April 2, 2008 5:51 AM
-
Denton Gentry
said...
-
-
April 2, 2008 7:33 AM
-
Denton Gentry
said...
-
-
April 2, 2008 7:44 AM
-
Wei Hu
said...
-
-
April 2, 2008 6:47 PM
-
MikeP
said...
-
-
April 2, 2008 7:46 PM
-
Anonymous
said...
-
-
April 2, 2008 8:31 PM
blog comments powered by DisqusI don't get it. Why can't you do both?
Develop with optimizations off, run unit tests with optimizations off during development and run unit tests with optimizations on before checking in.
And your testers should be getting code with optimizations on the whole time, while you fix the bugs they find with optimizations off.
I don't really understand what scenario it is where you have to pick one or the other.
There will also be bugs which you are more likely to find with optimizations off than on.
You forgot another thing.
gcc spits more warnings with -O2 or -O3 than with -O0. Even if you have -Wall -pedantic, etc.
It traverses the tree more times and it will catch more bugs statically.
Interesting read. This is a great explanation for why programming and debugging is a hairy task. I wish I had this with me every time I had a manager twiddling his thumbs behind me while asking "How much longer?"
And if only he would've understood this explanation.
A lot of desktop developers have moved to managed languages and gained benefits in doing so. The sort of bugs you mention go away. Of course, the garbage collection etc may not be deterministic enough for your application - but is there not some middle road for embedded languages of the future?
Why hasn't Java taken over the embedded world?
Anonymous [Comment #1]: Most of the time, we're cross-compiling. x86 Linux machines for development, but the target is a MIPS or PowerPC. So though we do employ some unit tests (by extracting portions of the code into a scaffold and compiling it for x86), most of the testing is done on the target running the full system.
We could compile -O0, run the system tests, and then -O2 to run again, but the amount of time required means people would frequently skip it.
Anonymous [comment #4]: Why do we still mostly use C in embedded development? There are a few reasons I can think of:
- We spend more of our time manipulating hardware, using pointers to registers. Java isn't really aimed at that, we'd end up using native methods a lot.
- image size is limited. The system I work on right now has 40 Megs of flash in which to store its complete system image. The kernel, the filesystem (including all system libraries), and some platform-specific stuff like FPGA images must all fit within 40 Megs. J2EE is a non-started, J2SE would be difficult to fit, and though J2ME is small enough it is pretty limited.
- We're generally running on slow processors. My current work is on a 300 MHz PowerPC.
- The last reason is simply inertia. Most of the people you try to hire for embedded work won't have Java experience.
We do use scripting languages: Tcl and Lua are both very popular in this space.
I cannot believe that "When compiled -O0, all automatic variables are automatically initialized to zero". At least not true for my tests compiled by gcc-4.2 on x86.
Awesome article Denny.
This is a goldmine of information (that should go into the gcc man page in bold or have it's own section)
In our case we couldn't move from -O0 to -O2 without lots of regressions. We had to settle for -Os instead.
one thing you might mention is -fstack-protect, which is easy to set up -- just implement __stack_chk_fail()
Are you seriously suggesting -O2 as a substitute for, say, valgrind? These tricks are cute but they're only just the beginning.