Sunday, February 24, 2008

Premature Optimization for Fun and Profit

A google search for "premature optimization" turns up tens of thousands of hits, most of them quoting Hoare's maxim that premature optimization is the root of all evil.

However I would argue there is one optimization which, if it is to be done at all, should be enabled at the very start of the project. It is the most simple optimization imaginable: invoke the C compiler with optimization enabled. It might seem obvious to do so, but I've worked on more than one project which made the deliberate decision to start with -O0 to ease debugging and speed development. The intention is always to switch on optimization when the project is approaching release, but it rarely works out that way.

To be certain, there are sound reasons to use -O0 when developing the system. Optimized code is harder to debug: the line numbers in the debugger will not always be correct, variables you want to examine may have been optimized away, etc. However, it turns out to be exceedingly difficult to turn on compiler optimization for a software project which has been under development for a while. Code compiled -O2 has a number of differences compared to -O0, and there are several classes of bugs which will never be noticed until the day compiler optimization is enabled. I'll detail several of the more common such problems I've encountered.


Optimization Tidbit #1: Stack space
char buf[16];
int a, b, c;

a = 0;
b = a + 1;
c = b + 2;

... a and b never used again...

 
Stack frames will be larger when compiled -O0. All declared variables will be allocated stack space in which to store them. When compiled -O2 the compiler looks for opportunities to keep intermediate values in registers, and if a value never needs to be stored to the stack then no space will be allocated for it.

What impact does this have? Because a good portion of the stack frame consists of "useless" space in the unoptimized build, a stack-related bug in the code is far less likely to be noticed as it will not cause any visible failure. For example:


 
char buf[16];
int a, b, c;

strncpy(buf, some_string, 20);

Because the optimized stack frame only contains space for things which are actually used, corruption of the stack is almost guaranteed to cause a failure. I'm not arguing this is a bad thing: walking off the end of a buffer on the stack is an extraordinarily serious coding problem which should be fixed immediately. I'm arguing a somewhat more subtle point: enabling optimization late in the development process will suddenly expose bugs which have been present all along, unnoticed because they caused no visible failures.


 
Optimization Tidbit #2: Stack initialization

When compiled -O0, all automatic variables are automatically initialized to zero. This does not come for free: the compiler emits instructions at the start of each function to laboriously zero it out. When compiled -O2 the stack is left uninitialized for performance reasons, and will contain whatever garbage happens to be there.

I wrote this article based on notes from debugging a large software product at a previous employer, where we transitioned to -O2 after shipping -O0 for several years. One of the routines I disassembled in that debugging effort contained a series of store word instructions, and I jumped to the conclusion that -O0 was deliberately zeroing the stack. I've had that in my head for several years now. However as has been pointed out in the comments of this article, my conclusion was incorrect. The instructions I was poring over at that time must have been for something else; I'll never know exactly what.

Part of the reason for starting this blog was to try to learn more about my craft. I freely admit that the things I don't know can (and do) fill many college textbooks. I'll thank Wei Hu for gently setting me straight on this topic, in the comments for this article. I've deleted the rest of the incorrect details of this Tidbit #2; I'll leave the first paragraph as a reminder.


Example

I worked on a product which had fallen into this trap, and had shipped to customers for a year and a half (through three point releases) still compiled -O0. It became increasingly ridiculous to try to improve performance weaknesses when we weren't even using compiler optimization, so we drew the proverbial line in the sand to make it happen. The most difficult problem to track down was in a module which, sometimes, would simply fail all tests.

int start_module() {
    int enable;

    ... other init code ...
    if (enable) activate_module();
}

It had a variable on the stack of whether to enable itself. This variable was not explicitly initialized, but when compiled -O0 one of the routines the main function called was walking off the end of its stack and scribbling a constant, non-zero value over enable. Thus the module would enable itself, quite reliably.

When compiled -O2, that other routine ended up scribbling on some something else and leaving enable alone. Thus, whether the module enabled itself depended entirely on the sequence of operations leading up to its activation and what precise garbage was on the stack. Most of the time the garbage would be non-zero, but occasionally just due to traffic pattern and random chance we'd be left with enable=0 and the module would fail all tests.

The real problem in this example is the routine which scribbled off its stack frame, but the effect of that bug was to make new regressions appear when -O2 was enabled. The longer a code base is developed, the more difficult it is to switch on optimization.


What does it mean?

Planning to do most of the development with -O0 and switch on optimization before release results in suddenly exposing bugs hidden throughout the code base, even in portions of the code which were considered stable and fully debugged. This will result in significant regression, and in the crunch time approaching a release the most likely response will be to turn optimization back off and plan to deal with it "in the next release." Unfortunately the next release will also include new feature development, fixing customer escalations, etc, and going back to turn on optimization may have to wait even longer. The more code is written, the harder it will be.

My recommendation is to do exactly the opposite: you should enable -O2 (or -Os) from the very start of the project. If you have trouble debugging a particular problem see if it can be reproduced on a test build compiled -O0, and debug it there. Unoptimized builds should be the exception, not the norm.

Introduction

I've developed embedded systems software for a number of years, mostly networking products like switches and routers. This is not a common development environment for most of the software blogs I've seen, so I'm writing this to show how the other half lives.

Years ago these type of systems would run a realtime OS like vxWorks, if they used an OS at all. Nowadays they frequently run Linux. Programming for an embedded target, even one running Linux, is a bit different from writing a web service or desktop application:
  • The target is likely using an embedded processor like PowerPC, MIPS, or ARM. Everything is cross-compiled.
  • The target may have a reasonable amount of RAM, but it will probably not have any swap space. Memory footprint is an issue.
  • The target is likely using flash for its filesystem. Image size is an issue.
This leads to a different set of trade-offs for development. One has to ask questions before pulling in a library or framework: can it be cross compiled? Is the binary small enough to fit?

In this blog I don't expect to say much about Ruby on Rails, web services, or functional languages of any description. This is a blog about the dirty business of building systems. I hope you enjoy it.