Coding Relic

Friday, October 31, 2008

Ode to Enum

Let's just get this out in the open: I love enum. Enum is a friend of mine. Enum and I go way, way back.

I also detest enum's evil twin, the series of #defines.

typedef enum { OP_FOO, OP_BAR, } operation_t;	#define OP_FOO 0 #define OP_BAR 1
GOOD	BAD

Why this irrational hatred for poor #define? After all, it has had a rough life. It isn't even a real part of the C language, being completely a pre-processor construct. Allow me to explain how #define has wronged me, not just once but on multiple occasions.

When one has a series of #defines and one needs to add a new code point, the natural inclination is to grab the next unused value:	Meanwhile on a different branch in the version control system, a colleague needs to add a new code point:
#define OP_FOO 0 #define OP_BAR 1 #define OP_BUCKAROO 2	#define OP_FOO 0 #define OP_BAR 1 #define OP_BANZAI 2

If you are lucky, this will be flagged as a merge conflict and someone will notice the duplicate values. If you are not lucky, the entries will have been added in slightly different locations such that there is no conflict... or perhaps the poor schmuck tasked to resolve the conflicts only cares about making the darn thing superficially compile so they can get back to their real work of pounding out new operation types.

#define OP_FOO       0
#define OP_BAR       1
#define OP_BUCKAROO  2
#define OP_BANZAI    2

... and much mayhem ensues.

Had we used enum instead, the bad outcome is prevented. The enumeration will assign unique values:

typedef enum {
    OP_FOO,
    OP_BAR,
    OP_BUCKAROO,
    OP_BANZAI,
} operation_t;

From this, Gentle Reader, we can conclude that version control systems must love enums.

gcc loves enums

gcc -Wswitch-enum provides a very nice benefit for enums and switch statements. The compiler will require that all enumerated values be handled in the switch, and will complain if they are not. Consider this code:

int test(operation_t op)
{
    switch(op) {
        case OP_FOO:
        case OP_BAR:
            return -2;
            break;
        default:
            return -3;
            break;
    }
 } /* test */

When new enumeration values are introduced, the compiler will flag the places where handling needs to be added:

./m.c:137: warning: enumeration value 'OP_BUCKAROO' not handled in switch
./m.c:138: warning: enumeration value 'OP_BANZAI' not handled in switch

Older gcc releases provided a weaker version of this checking in the form of -Wswitch, where a "default" case is considered to handle all enumerated values. So the presence of a default case renders -Wswitch useless. If you want to take advantage of -Wswitch but also want to practice defensive programming, unexpected values need to be handled outside of the switch:

int test(operation_t op)
{
    switch(op) {
        case OP_FOO:
        case OP_BAR:
            return -2;
            break;
    }

    return -3;
 } /* test */

The -Wall argument to gcc enables -Wswitch. If you are using a recent gcc, explicitly adding -Wswitch-enum is a good idea.

gdb loves enums

Enums make debugging easier, with symbolic values:

(gdb) print mydefine
$1 = 3
(gdb) print myenum
$2 = OP_BANZAI

Isn't that better?

enums on a diet

There is one minor disadvantage of using an enum: by default, an enum is the size of an int. So on a 32 bit machine an enum takes up 4 bytes of memory, even with only a small set of enumerations. However this bloat is easily eliminated, at least when using gcc:

typedef enum {
    OP_FOO,
    OP_BAR,
    OP_BUCKAROO,
    OP_BANZAI,
} __attribute__((packed)) operation_t;

A packed enum consumes the minimum footprint; in this example it is a single byte. If there are more than 256 enumerations, or if enumerations are explicitly assigned to a large absolute value, the footprint will grow to 2 or 4 bytes.

Do you love enums?

Enums have been very, very good to me.

Updates

In the comments, Rob asked:

all very good - but can you do in c as you do in GDB i.e. can you show me a function myReverseEnumFuction() which when called:
printf("enum string=%s", myReverseEnumFunction(2) );
would output "OP_BUCKAROO" ???

Unfortunately no, I do not think there is a good way to do this in the general case. gdb is able to print the symbolic names for enumerations by referencing debug sections in the binary. It is certainly possible to use the same debug sections to construct myReverseEnumFunction(), but there are two drawbacks:

Production binaries are routinely stripped of debugging information, which would render myReverseEnumFunction() inoperable.
There are a number of different (and incompatible) formats for debugging sections. Even within the Unix-ish world using the ELF binary format there is STABS and several variations of DWARF. myReverseEnumFunction() would not be very portable.

The DWARF format is an interesting topic, and I've put it on my list of things to research and write about. In the meantime I would like to mention the C pre-processor's stringify capability ("#(argument)"), which may come in handy when manually constructing a myReverseEnumFunction() function. The MAP() macro below uses stringify to populate a structure with the enumeration name.

#define MAP(x)  {x, #x}

struct enum_mapping {
    bb_t code;
    char *name;
} enum_map[] = {
    MAP(OP_BUCKAROO),
    MAP(OP_BANZAI)
};

char *myReverseEnumFuction(bb_t bb)
{
    int i, max = sizeof(enum_map) / sizeof(enum_map[0]);

    for (i = 0; i < max; i++) {
        struct enum_mapping *emap = &enum_map[i];
        if (emap->code == bb) {
            return emap->name;
        }
    }
}

Tuesday, October 14, 2008

Aliasing By Any Other Name

This time, Gentle Reader, we'll delve into one of the sticky areas of the C language: pointer aliases. Let's dive directly into code:

void
printval(uint32_t val32)
{
    printf("0x%08x\n", val32);
}

int
main()
{
    uint32_t val32 = 0;
    uint16_t *p16 = (uint16_t *)&val32;

    p16[1] = 0xa1fa;

    printval(val32);
}

We have a 32 bit integer on the stack, to which we also direct a pointer to a 16 bit integer. Writing to p[1] overwrites 16 bits of the word.

$ cc -O0 ./test.c
$ ./a.out
0x0000a1fa

I can hear the groans already: not another little-versus-big endianness discussion. No, thats not it! It is true that the result would be different on a little endian machine, but the far more interesting discussion concerns what happens when we compile -O2:

$ cc -O2 ./test.c
$ ./a.out
0x00000000

Why is the result different? The plot thickens...

As it is one of my favorite techniques, we'll look at the generated MIPS instructions to see why this happens.

00:	li v1,0xa1fa	load 0xa1fa into register v1
04:	move a0,zero	load 0 into a0 (the 1st function argument)
08:	sh v1,26(sp)	store 0xa1fa to the stack, where val32 is stored
0c:	lw t9,0(gp)	load address of printval
10:	jalr t9	jump to printval
14:	nop

The second instruction loads zero into register a0, which will be passed as the first argument to printval(). val32 at that moment is zero. We then store 0xa1fa to 16 bits on the stack, overwriting part of val32. The zero in the argument register is untouched, so we end up passing zero to printval() even though val32 on the stack has changed to 0x0000a1fa. Surely this is a bug in gcc's optimizer, right?

No it isn't a flaw in gcc, but understanding this behavior will take us back in the history of the C language.

Set Wayback Machine to 1974 DEC PDP11/20

C has a reputation as a language for high performance code. Yet the language is over 30 years old, during which time computer architecture has changed considerably. Instruction execution and memory access performance have both increased phenomenally, but execution speed has improved far more rapidly than memory. Accesses to memory have become almost two orders of magnitude more expensive relative to the CPU speed compared to 30 years ago, and a great deal of silicon is now spent on data caches to avoid the cost of going to memory.

One difficulty with optimizing C code is the free-wheeling use of pointers. Two pointers are called aliases if they point to the same memory location, and in general the compiler cannot really know whether any two arbitrary pointers might be aliases. To be completely safe, whenever a write is done to a pointer the compiler would have to forget any values fetched using other pointers. Those values would be re-fetched from memory, just in case they had been changed by the aliasing pointer.

Unfortunately keeping values in registers is a key factor in good performance. Even if the CPU data cache still holds all of the data, the additional instructions to reload values into registers can easily bloat an inner loop and measurably slow its performance. For a long time C compilers were required to handle aliasing of arbitrary pointers. To get good performance they would "cheat" in specific cases, retaining values in registers when it was pretty certain there was no aliasing. Sometimes the compiler would be wrong, and you'd end up with incorrect behavior for a specific (and probably convoluted) segment of code with obscure aliasing somewhere. The subtle, inconsistent behavior of compiled code was becoming a problem.

Set Wayback Machine to 1999

To try to resolve the ambiguous nature of this handling, the C99 specification clarifies how compilers are allowed to optimize with respect to pointer aliasing:

void * can alias anything
char * can alias anything
any other two distinct pointer types are assumed not to be aliases

Two pointers of the same type must be treated as if they were aliases, reloading the value from memory whenever the other changes. Two pointers of different types are assumed not to alias: operations on a pointer will not invalidate the values of other pointers. In the vast majority of cases this assumption is true, and it helps the C compiler to generate faster code.

The problematic statement in our example code is:

uint16_t *p16 = (uint16_t *)&val32;

"&val32" is a pointer to uint32_t. p16 is a pointer to uint16_t. Because they are different types, they are treated as non-aliasing. The fact that they obviously are aliases due to the way they are assigned is not relevant. The point of specifying the aliasing rules in C99 was to get away from the hodgepodge of unclear assumptions and heuristics, to follow just one consistent set of rules. The pointers are different types, therefore they are assumed not to alias.

Solutions

Leaving aside the contrived example which kicked off the article, how does one resolve this problem? One possible solution is the use of void or char pointers, which the compiler must treat as a potential alias of any other pointer. For complex casts of one structure to another, passing a void pointer around may be worth considering. Unfortunately pervasive use of void sacrifices most of the type checks the compiler offers, so you can have argument bugs creep in which would otherwise have been easily caught.

Another, simpler solution is to have the compiler handle aliasing more leniently. Specifying C89 mode, for example, requires the compiler to operate by the earlier language rules where any pointer could potentially alias another. Also, gcc supports a -fno-strict-aliasing flag which disables only the aggressive alias optimizations without impacting the rest of the C99 handling.

A solution which does not work is passing the pointer around, for example if we passed a pointer to val32 to printval() in the example code. In theory this would require the called routine to fetch the value from memory (side-stepping any aliasing issues), but in practice the compiler could inline printval() and use the same registered value as caused the controversy in the first place. Even worse, the problem could suddenly appear years after the code is written when the compiler is updated or compilation flags are changed resulting in different inlining behavior.

In most cases where you have an existing codebase with aliasing problems, -fno-strict-aliasing is the best way to go. Aliasing bugs can be incredibly difficult to find. There is a slight performance hit to -fno-strict-aliasing in the form of additional references to memory, but in most cases it will be small as the data cache will satisfy all requests.

Update: In the comments, Lance talks about compiler warnings:

Also worth noting, compiling this bit of code with "-O2 -Wall" causes some versions of gcc to complain "warning: dereferencing type-punned pointer will break strict-aliasing rules", which can be useful when attempting to track down this type of problem. Interestingly, "-O0 -Wall" does not produce this warning because gcc only does the analysis required to detect this situation when optimizations requiring alias analysis are enabled.

Thursday, September 11, 2008

Random Musings on stackoverflow.com

Stackoverflow.com is a site by programmers and for programmers, whose purpose is answering questions on topics relating to software development. At the time of this writing it has completed a closed beta test and is in a more extensive test phase, ramping up to open on September 15, 2008. Access to the beta site is open to anyone who wants it during the test period, see blog.stackoverflow.com for details.

Stackoverflow is the brainchild of Jeff Atwood and Joel Spolsky, long-time bloggers on software development topics. It is described by its creators as the intersection of a Wiki, blog, forum, and digg/reddit voting site. Users post questions and answers about programming topics, and the questions and answers are voted on by the community. The highest voted answers rise to the top.

Incentives, aka crackoverflow.com

The site has a series of incentives to keep people coming back. The biggest incentive is the reputation score, which goes up and down depending on how other users rate your questions and answers. Reputation encourages asking questions and providing answers, in order to improve ones score.

Reputation is in many ways similar to the Karma awarded by reddit.com and Hacker News. Capabilities on the site are unlocked by having a high enough reputation. For example voting on posts requires a score of 15 or more, where new users start with a score of 1.

The other, more unique rewards system on stackoverflow are the badges. Badges are directly modeled after the achievements awarded in Xbox games, which encourage gameplay by granting an award when a level is completed or other objectives achieved. The achievements show up in a gamers profile on Xbox Live, where they are used to talk smack about opponents.

badges The stackoverflow badges encourage exploration of the features of the site. The first few badges area easy to obtain: fill out your profile to obtain the Autobiographer badge, vote up to get the Supporter badge, and gain the Teacher badge by asking a question which is voted up. Later badges get harder, requiring larger numbers of users to vote on your submissions.

Are these mechanisms effective? I think the reputation score is, certainly. As in a video game, there is a certain thrill to seeing ones score go up. The badges seem, well, hokey to me. I'll grant that I'm probably not the target demographic anyway, badges might have more resonance with those who play games regularly.

Content

The questions during the stackoverflow beta have ranged widely, but by far the most common questions are about Microsoft topics like .Net and SQL Server. This is likely a reflection of the audience of Jeff and Joel's respective blogs being heavily invested in these technologies.

I've been pleasantly surprised by a number of the questions posed on the site. Where the programming sub-reddit is now completely dominated by Ruby, Python, and Haskell, stackoverflow is covering a much wider mix of technologies. There have even been a few embedded system and assembly language questions.

Technical solutions to social problems

The history of web communities dedicated to software developers has not been pretty. The comment section of the programming sub-reddit is now so filled with vitriol and casual slander as to be unpleasant to read, let alone participate in. Hacker News attempts to prevent this slide into toxicity by a combination of social pressure and technical measures like not allowing downvotes until a certain karma has been achieved.

Stackoverflow is relying heavily on technical solutions to social problems. For example:

Downvotes are only allowed by those at higher reputations, and reduce the target reputation by two where an upvote adds ten. Additionally downvoting costs the voter one point of reputation, to discourage frequent use.
To discourage trivial edits to bump a question to the top of the list, more than five edits to a post make it "community owned" and no longer grant reputation to its poster.
Users with sufficiently high reputation can edit another users posts or comments, for example to remove mean-spirited remarks. It is hoped they will use these powers for Good and not Evil.
Users can mark a question or answer as offensive. This does not merely signal a moderator; if enough users click offensive the question will be automatically removed.

Speed bumps coming

I really like the premise behind stackoverflow, of a social media site designed specifically for programmers. I think there is definitely a need which reddit, DZone, and other developer-focussed sites do not fill.

However I think this initial iteration of stackoverflow has a psychological problem: it tries to simultaneously be a voting site like reddit and digg, and a community site like Wikipedia. What you end up with is postings associated with a particular users name, and which provide benefits in the form of reputation, but which can be edited by other users. The site attempts to ameliorate this by making a post "community owned" after five edits, so it no longer belongs to the posting user. This step function solution brings the opposite problem: the poster feels cheated out of the reputation from what has become a popular topic.

This mismatch between the digg and wikipedia models is not working smoothly yet, in my opinion. The digg model thrives on ownership, the wikipedia model on anonymity.

Some other thoughts about future issues:

On other sites even replying to someone else's comment can elicit a defensive reaction. I suspect that editing another user's question or answer on stackoverflow will lead to a series of edit-revert-edit-revert struggles.
There is an enormous incentive to answer quickly: the early answers accrue upvotes and gain significant inertia to accrue more upvotes. A number of users answer immediately by quoting Wikipedia or the first Google result, and gain significant reputation by doing so. stackoverflow will have difficulty succeeding if the highest rated answers are mostly Wikipedia rehashes.
The site is far more active during US business hours, and relatively idle during nights and weekends. When combined with the strong incentive to answer immediately, the site can become a productivity drain. We may see companies block stackoverflow in the same way that Facebook is often blocked.

Conclusion

I really like stackoverflow, and I want it to succeed. I think its business model is sound: focus on a well-defined niche, provide real value, and carefully select advertising to target that specific audience.

UPDATE: This article was picked up on reddit, with amusing commentary. Also Sara Chipps and Dan Dyer have written about their initial impressions of stackoverflow.

Wednesday, September 3, 2008

The Good, Bad, and Ugly of Gizmo Construction

Rather by definition, embedded software involves building a gizmo of some sort. Manufacturing the hardware portion of the gizmo turns out to be somewhat more complicated than writing a Makefile and starting the build... who knew?

Today, Gentle Reader, we'll discuss the realities of building hardware products by meandering through a few topics:

Contract Manufacturing
Distributors
Contract Engineering
DVT and Compliance

Contract Manufacturing

Contract manufacturers build a product according to a completed design including the Bill of Materials, the layout of the printed circuit board, assembly instructions, system tests, etc. They can do as much or as little as desired, from a single board up to a completed and tested system directly shipped to your customer.

In the last 25 years contract manufacturing has essentially taken over the production of all electronic goods. The economies of scale from producing such large volumes are overwhelming, and there are very few companies who maintain their own production facilities now. The capital expenses for kitting out an assembly line are daunting, the CM can amortize the cost over a huge number of products. Some well known contract manufacturing firms are Foxconn (also known as Hon Hai), Flextronics (which owns Solectron, another large CM), Celestica, and Sanmina-SCI.

Contract Manufacturers thrive on volume. If building something in really small volumes it is worth looking into production shops aimed at hobbyists, such ExpressPCB.

One thing to keep firmly in mind: the value of a customer to a CM is measured entirely by the volume of future business they expect. If order volumes drop off, the CM will rapidly lose interest. If your market suffers a serious downturn you may find that the quality of the product drops precipitously as the CM rushes through the build in order to get to another, more profitable customer. Similarly if you decide to terminate business with a CM and move to a competing firm, expect astonishingly bad build quality on the final order.

Distributors Fistful of Dollars

Like it or not, distributors are absolutely necessary in the embedded market. There are tens of thousands of customers buying millions of components. The component manufacturers simply cannot afford to maintain individual relationships with each customer when a distributor can represent multiple manufacturers with the same effort and expense.

Many Distributors provide FAEs (Field Application Engineers) to assist their customers in selecting components. A good FAE is extremely valuable, due to the sheer number of product designs they have been involved in. They will often be able to suggest alternatives which you might not otherwise have found, and know which parts have been problematic in other designs.

Distributors and Registration

Distributors who perform a significant amount of technical support during the design of a product cannot allow themselves to be undercut by a cheaper alternative in production. Therefore the component manufacturers allow distributors to register as being responsible for the design win at a particular customer. Only that distributor will be allowed to offer advantageous pricing, other distributors will not be allowed to undercut them.

Distributors do a lot more business with the big Contract Manufacturers than any individual customer, and CMs value their relationship with the distributor more than any individual customer. It is not unknown for a Distie to legitimately claim the design win for one particular component, then obtain the Bill of Materials from the CM and register themselves for every chip in the design. Choosing a Distributor is much like getting married: be very certain that the relationship will work in the long term before signing on the dotted line.

Contract Engineering Schematic

For a niche market with well-defined needs, a product can sell for years with minimal changes. In these cases its better to contract out the design than recruit a team. The contract will include the functional spec to be designed, timelines for completion, etc. The engineering firm will supply the requisite hardware, software, and firmware expertise, and the resulting design will belong to the customer.

Alternately, Contract Engineering firms can fill in gaps for an in-house design team. For example after components have been chosen and the design competed, a layout for the printed circuit board must be created. A good layout of a complex PCB requires an experienced designer and expensive CAD tools. It makes no sense to keep such a person on staff if only a few designs are done each year, so it is often contracted out. Mechanical design of the chassis and other sheet metal is also often done outside, for the same reason.

Contracts in this area are essentially always on a time and materials basis. The upfront estimate of total cost is not guaranteed. Fixed price contracts are exceedingly rare, because they represent an enormous risk to the engineering firm if the design time goes over estimates. In most cases this will work out fine: the engineering firm will want to get the design done quickly in order to move on to the next customer. However if business slows, watch out for unjustified padding of billable hours.

Design Verification Test Eye pattern

Not every unit coming out of the factory will be identical. Each component in the design has a tolerance, an allowed deviation which is still considered within specification. Every system coming off the line will contain parts slightly above spec or below, and it is important to insure that the complete system will still function reliably even if it contains components at the extreme ends of the tolerance. Though the hardware design takes these tolerances into account, in the real world it is difficult to anticipate every possible interaction of components and PCBs.

This is where DVT, for Design Verification test, comes in. During DVT the hardware engineers measure the system to ensure it not only does what it is supposed to do, but does so with sufficient margin to handle variances in its components. DVT is time-consuming work, and changes are often made in the production system based on what is found in the prototypes.

Compliance

The ubiquitous Underwriters Laboratories logo is likely the most widely known example of a compliance certification, but it is not the only one. There are a huge number of standards for product safety or performance, some recognized around the globe and others specific to each geographic market.

Some industries place much more stringent requirements on their products than others. For example, the NEBS Level 3 guidelines for the telecommunications market specifies that there be no noxious chemicals released if the equipment catches fire.

Conclusions

One of the aims of this blog is to provide insight into fields of software development which don't get as much exposure as Ruby on Rails or web frameworks. I've tried to provide an overview of some of the gritty details of building products. Its really quite exciting when the first prototype of a new system comes back from the factory, and you try to boot the software for the first time. When it doesn't boot, reality hits.

I'd like to thank John Walsh for much feedback and good suggestions of topics to cover in this posting.

Thursday, August 21, 2008

[0123456789abcdef]

There are only 16 characters to work with, but programmers just love coming up with creative spellings using hex numbers. I suspect that leetspeak evolved out of these spellings, though personally I prefer the original form. I assume that everyone working as a software developer in English will have seen 0xdeadbeef, and probably some other favorites as well.
deadbeef	old standby #1
feedf00d	old standby #2
feedface	old standby #3
decafbad	developers love to complain about coffee
badcafe	developers really love to complain about coffee
badc0ffee	developers really, really love to complain about coffee
badc0c0a	MacOS X developers might find more meaning in this one.
c0c0abad	People who hate MacOS X developers might find more meaning in this one.
A little sed scripting on /usr/share/dist/words can turn up a lot of interesting combinations. For the edification and bemusement of the Gentle Reader, allow me to present a few of them here. I rejected most of the results where '5' replaced an 'S' as being too ugly, but a few passed muster. cat /usr/share/dict/words \| sed \ -e "s/nine/9/g" -e "s/eight/8/g" -e "s/seven/7/g" -e "s/six/6/g" \ -e "s/five/5/g" -e "s/four/4/g" -e "s/three/3/g" -e "s/two/2/g" \ -e "s/one/1/g" -e "s/zero/0/g" \ -e "s/ated/8ed/g" -e "s/[oO]/0/g" -e "s/[lL]/1/g" -e "s/[sS]/5/g" \ \| egrep -v "[^0123456789aAbBcCdDeEfF]" The first few seem particularly suitable for memory fenceposts, either guard words before and after allocations or patterns to scribble over freed memory when looking for use-after-free bugs.
a110c8ed	This memory is in use, buster!
5eef3712	This is ~(0xa110c8ed). No, it doesn't spell anything nifty.
dea110c8	Scribble over memory after free(), to catch dangling references.
defec8ed	to crap all over memory
defaced	another bit pattern to scribble over memory to catch use-after-free errors
To express one's true feelings about the quality of the code base there are really only two options: Profanity-laden comment blocks Clever use of constants
c0defaded	It is a well known fact that old code suffers bit rot. Refactor often!
badfacade	There are times when bad code can only be papered over. This is one of those times.
efface	Good code doesn't make a spectacle of itself.
deface	Bad code, on the other hand, gets drunk at its best friends wedding and hits on the bride.
decade	This code base has been under development for a long time.
baddeed	The EULA for this product specifies the precise amount of bad karma accumulated by using it.
acceded	The software has finally given in.
befa11	As in "what has befallen yon dead process?"
c0dedbad	self explanatory
Magic numbers are useful in all sorts of situations. Encoding one's birthday (0xMMDDYYYY) is clever, but obscure. Subtle jokes in hex also work well.
abbacadabba	Unfortunately 44 bits won't magically fit into a uint32_t.
abadabba dabbadabba abbadabbadabba	Said the monkey to the chimp. Real magic numbers are 128 bit.
d00bee	Debugging probably qualifies as "medicinal purposes."
d0dec0de	How does one pronounce ioctl anyway? "eye oh cottle," or "eye oct all ?"
babe2bed	The kid's bedtime is 7pm sharp.
b0cceba11	You know, I only discovered Bocce Ball in my 30s.
5ca1ab1e	Ignore what you see elsewhere, the secret to scalability is in using good magic numbers.
0x1de	With the leading 0x it sortof looks like "oxide" ... I admit it, this one sucks.
Why should return codes be boring? {0, 1, 2, ...} is so dull. We can do better.
cab0b	yummy
fa1afe1	even more yummy!
b1abbed	probably I/O related
bedded	The code went to sleep?
b0bbed	They call it "floating point" for a reason, bub.
beaded	Um, yeah. I can't think of anything funny about this one.
bab00	My sweet baboo!
10aded	I bet it has an itchy trigger finger, too.
ba11ad	I structure all my code to iambic pentameter.
a100f	My code doesn't like me.
acc01ade	Programmers rarely, if ever, hear praise of their work.
affab1e	Relatively approachable and friendly code, I guess.
babb1e	Why yes, my functions tend to be a bit on the longish side. Why?
baff1e	Why yes, my functions tend to be a bit on the complex side. Why?
babe1	You can write FORTRAN in any language.
ba1b0a	Its the Eye of the Tiger, baby!
ed1f1ce	large, imposing blocks of code
5eceded	This module has declared its independence from the rest of the system.
5c01ded	At times, it is necessary to be stern with the codebase. Give it a time out.
5caff01d	This code was intended to be temporary. That was four years ago. Such is the way of things.
ad0be	I bet they use this one in Photoshop.
ab0de	my humble abode
d8edbabe	In college, yeah, sure you did.
0ddba11	That is a strange one, alright.
Finally, here are some 16 bit numbers which are more interesting than "dead," "f00d" and "beef"
caca	A statement about software quality, I suppose.
deaf	When programming, no-one can hear you scream.
c0ed	It is the 21st century, after all.
ba1d	If tires can go bald, why not programs?
a1fa	I couldn't find a reasonable approximation of beta.
f01d	Origami programming!
fa11	If code falls in the forest, does it make a sound?
c01d	Software is a dish best served cold.
ab1e	Or ! 0xab1e, as the case may be.
cede	I give up, I'm done.
Do you have any additional hex numbers to share? The comments section is open for business.

Update: Lisa Simone wrote an article about teaching embedded systems and the use of hex words in an article on her site.

Thursday, August 7, 2008

opensource.mycompany.com

Using GPL software imposes the requirement to redistribute the source code, but this requirement is routinely ignored in commercial products. That is a shame: even if one doesn't care about the goals of the free software movement, simple pragmatism would still favor providing the source code. Violating the GPL can cause Bad Things™ to happen, and compliance isn't very difficult. It is quite common for products to incorporate an almost unmodified busybox, glibc, and Linux kernel. Providing the source code for these cases is straightforward, and doesn't risk inadvertently giving away intellectual property.

Section 3 of version 2 of the GNU Public License concerns the responsibility to distribute source code along with a binary:

  3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:

a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

c) Accompany it with the information you received as to the offer to distribute corresponding source code.  (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)

The source code for a work means the preferred form of the work for making modifications to it.  For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to
control compilation and installation of the executable.  However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.

If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.

I am not a lawyer, though I think it might be fun to play one on TV. There is a lot of detail in the GPL about the requirements for distribution of source code, and maybe I'm dense but I don't understand what half of it means. However I would contend that if you get to the point of needing to argue over the precise definition of the terms in a legal context, you've already failed.

The problem with violating the GPL is not that you'll get sued. Of course, it is quite possible you'll be sued for violating the GPL...

... but getting sued is not the real problem. The real problem is when a posting about misappropriation of GPL software shows up on Slashdot and LWN. The real problem is when every public-facing phone number and email address for your company becomes swamped by legions of Linux fans demanding to know when you will provide the source code. The real problem persists for years after the event, when Google searches for the name of your products turn up links about GPL violations coupled with ill-informed but damaging rants.

So we want to avoid that outcome. If you read the legal complaints filed by the Software Freedom Law Center, they follow a similar pattern:

Someone discovers a product which incorporates GPL code such as busybox, but cannot find the source code on the company web site (probably because the company hasn't posted it).
This person sends a request for the source code to an address they find on that website, possibly support@mycompany.com.
This request is completely ignored or receives an unsatisfactory response.
The person contacts SFLC, who sends a letter to the legal department of the infringing company demanding compliance with the license and that steps be taken to ensure no future infringements take place.
SFLC also demands compensation for their legal expenses; thats how they fund their operation.
The corporate legal team, misreading the complaint as a shakedown attempt, stonewalls the whole thing or offers some steps but refuses to pay legal costs.
Lawsuit is filed, and the PR nightmare begins in earnest.

Keeping Bad Things From Happening

There are two points in that progression where the bad result could be averted, in steps #2 and #4. Unfortunately it is not likely you can influence either one:

In step #2 you have no idea where that initial request for the source code will go. They might send email to the sales department, or tech support. They might call the main corporate number and chat with the answering service. The request will very likely be filtered out before it makes it to someone who would realize its significance.
By the time the lawyers get involved in step #4, you're already toast. Corporations, particularly medium to large corporations, are routinely targeted to extract money for licensing intellectual property, business partnerships, or any number of reasons. The GPL claim will look like all the rest, and be treated in the same way.

This is a case where it is best to be proactive. One can't realistically wait until the first time someone requests the source code, too many things can go wrong and lead to the PR nightmare. Instead, Gentle Reader, it is best to post the GPL code somewhere that it can be found with little difficulty by someone looking for it, but otherwise draw little attention to itself.

When the GPL was created software was delivered via some physical medium (magnetic tapes, later supplanted by floppy disks, CDs, DVDs, etc). One was expected to include the source code on the same medium, or at least be willing to provide another tape containing the source. Nowadays many embedded systems are delivered with the software pre-installed and updates delivered via the Internet, so adding a CD of source code would add to the Cost of Goods Sold. Anything which adds COGS is probably a non-starter, so we'll move on.

It is certainly an option to tar up all of the GPL packages from the source tree and try to get it linked from the corporate website, likely controlled by someone in the marketing department. That conversation may not go the way you want:

"Tell me again why we need to do this?"

"We're not an opensource company, we build widgets."

"Isn't Montavista supposed to take care of this for us?"

"Our market messaging revolves around the power of our brand and the strength of our secret sauce, not opensource code. End of discussion, you commie punk."

The (hypothetical) marketing person is not being unreasonable. Ok, the last one would be unreasonable, but I thought it would be funny. Nonetheless putting GPL source code right up on the corporate website implies it is a primary focus of the corporation, when in reality it probably is just one of many tools you use in building a product. Rather than find a place on the corporate website, I advise a separate site specifically for opensource code. It needs to be something which people can easily find if they are motivated to look for it, but otherwise not draw much attention to itself. opensource.<mycompany>.com or gpl.<mycompany>.com are reasonable conventions.

Next you need a web server. Your company may already work with a web host, otherwise Google Sites is a reasonable (and free) choice. You'll need IT to set up a DNS CNAME directing opensource.<mycompany>.com to point to the new web site. If you're using Google Sites there is a Help Topic on how to do this.

The goal here is to avoid the bad result (GPL violation being posted to slashdot), not draw attention. You shouldn't spend time putting together a snazzy web site, a simple background with links to tarballs is fine. Ideally nobody will ever look at these pages.

Documentation

Lets talk about documentation. There are a number of other open source software licenses, besides the GPL. Many of them carry an "advertising clause," a requirement that "all advertising materials mentioning features or use of this software must display" an acknowledgement of the code used. The use of this clause derives from Berkeley's original license for BSD Unix, and though Berkeley has disavowed the practice there is still a great deal of open source software out there which requires it.

In practice the advertising clause results in a long appendix in the product documentation listing all of the various contributors. Honestly nobody will ever read that appendix, but nonetheless it is worth putting together. You can also include a notice that the GPL code is available for download from the following URL... so if despite your best efforts the company does get sued, you'll have something concrete to point to in defense.

Now for the hard part

The Gentle Reader may have noticed that we have not covered how to locate the GPL code used within a product. Really I'm hoping that the source tree is sufficiently organized to be able to browse the top few levels of directories and look for files named LICENSE and for copyright notices at the top of files. If it is difficult to determine whether the product contains any open source code, there is an article at Datamation which might be helpful. It discusses compliance tools, including tools which look for signatures from well-known codebases to track down more serious GPL violations.

What about the difficult case, where GPL code is being used and has been extended with proprietary code which cannot simply be posted to a website? Even if one doesn't care about the free software ethos, pragmatically this is a ticking time bomb and one that should not be ignored. I'd recommend putting up an opensource website anyway to post what you can, and working as soon as possible to disentangle the rest. Development of new features in that area of the code can be used as the lever to refactor it in a GPL-compliant way.

Update 8/2008: The Software Freedom Law Center has published a GPL compliance guide.

10/2008: Linux Hater's Redux holds up this blog post as an example of why Linux should be avoided. Okey dokey.

12/2008: Add Cisco to the list of companies sued by the SFLC over GPL issues. This time the suit was filed on behalf of the FSF for glibc, coreutils, and other core GNU components. Reactions to the news from Ars Technica and Joe Brockmeier @ ZDNET have already appeared.

5/2009: Cisco and the FSF have settled their lawsuit. Cisco will appoint a Free Software Director, make attempts to notify owners of Liksys products of their rights under the GPL, and will make a monetary contribution to the FSF.

Wednesday, July 23, 2008

The Control Plane is not an Aircraft

In my industry at least, the high end of the margin curve is dominated by modular systems: a chassis into which cards can be added to add features, increase capacity, etc. Products at the cheaper end of the spectrum tend to be a fixed configuration, a closed box which does stuff but is not terribly expandable (often called a pizza box). The fixed configuration products sell in much larger volumes, but margins are lower.

Chassis has high margins and low volume, fixed config has lower margins and high volumes

Block diagram of fixed configuration system, showing CPU attached to ASICs via a PCI bus

In a pizza box system the control plane between the CPU and the chips it controls tends to be straightforward, as there is sufficient board space to route parallel busses everywhere. PCI is often used as the control interface, as most embedded CPUs contain a PCI controller and a lot of merchant silicon uses this interface.

In a chassis product, there is typically a central supervisor card (or perhaps two, for redundancy) controlling a series of line cards. There are a huge number of options for how to handle the control plane over the backplane, but they mainly fall into two categories: memory mapped and message oriented. In todays article we'll examine the big picture of control plane software for a modular chassis, and then dive into some of the details.

Memory Mapped Control Plane

A memory-mapped chassis control plane extends a bus like PCI over traces on the backplane.

Memory mapped chassis system with PCI over the backplane

To the software, all boards in the chassis appear as one enormous set of ASICs to manage.

Message Passing Control Plane

Alternately, the chassis may rely on a network running between the supervisor and line cards to handle control duties. There are PHY chips to run ethernet via PCB traces, and inexpensive switch products like the Roboswitch to fan out from the supervisor CPU out to each slot in the chassis.

Message Passing chassis system with ethernet links

This system requires more software, as each line card runs its own image in addition to the supervisor card. The line cards receive messages from the supervisor and control their local ASICs, while the supervisor has to handle any local ASICs directly and generate messages to control the remote cards.

Programming Model

As the Gentle Reader undoubtedly already knows, the programming model for these two different system arrangements will be radically different.

Memory mapped ASICs are mapped in as pointers	Message Passing ASICs controlled using, erm, messages. Yeah.
volatile myHWRegisters_t p; p = <map in the hardware>* p->reset = 1;	myHWMessage_t msg; msg.code = DO_RESET_ASIC; send(socket, &msg, sizeof(msg), 0);

Perhaps that was obvious. Moving on... At first glance the memory mapped model looks pretty compelling:

it the same as a fixed configuration product, allowing easy code reuse
in the message passing model you still have to write memory map code to run out on the line card CPUs, and then you have to write the messaging layer

Hot Swap Complicates Things

A big issue in the software support for a chassis is support for hot swap. Cards can be inserted or removed from the chassis at any time, while the system runs. The software needs to handle having cards come and go.

With a message passing system hotswap is fairly straightforward to handle: the software will be notified of insertion or removal events, and starts sending messages. If there is a race condition where a message is sent to a card which has just been removed, nothing terrible happens. The software needs to handle timeouts and gracefully recover from an unresponsive card, but this isn't too difficult to manage.

With a memory mapped chassis, hot insertion is relatively simple. The software is notified of an insertion event, and maps in the hardware registers. Removal is not so simple. If the software is given sufficient notice that the card is being removed, it can be cleanly unmapped and safely removed. If the card is yanked out of the chassis before the software is ready, havoc can ensue. Pointers will suddenly reference a non-existant device.

Blame the Hardware

So card removal is difficult to handle robustly in a chassis which memory-maps the line cards.

CompactPCI card showing microswitch and hot-swap LED

Ideally the hardware should help the software handle card removal. For example CompactPCI cards include a microswitch on the ejector lever, which signals the software of an imminent removal. The user is supposed to flip the switch and wait for an LED to light up, an indication that the card is ready to be removed. Of course, people ignore the LED and pull the card out anyway if it takes longer than 1.26 seconds... we did studies, and stuff... ok I just made that number up. Card removal is often then made into a software problem: "Just add a CLI command to deprovision the card, or something."

This makes for a reasonably nasty programming environment: to be robust you have to constantly double-check that the card is still present. Get a link up indication from one of the ports? Better check the card presence before trying to use that port, in case the linkup turns out to be the buswatcher's 0xdeadbeef pattern. Read an indication from one of the line cards that its waaaaay over temperature? Check that it is still there before you begin shutting the system down, it might just be a garbage reading.

Pragmatism is a Virtue

There is a maxim in the marketing strategy for a chassis product line: never let the customer remove the chassis from the rack - they might replace it with a competitors chassis. You can evolve the design of the line cards and other modules, but they must function in the chassis already present at the customer site. Backplane designs thus remain in production for a long time, often lasting through several generations of card designs before finally being retired. Though the Gentle Reader might have a firm preference for one control plane architecture or another, the harsh reality is that one likely has to accept whatever was designed in years ago.

So we'll spend a little time talking about the software support for the two alternatives.

Thoughts on Memory Mapped Designs

Software to control a hardware device does not automatically have to run in the kernel. There are only a few things which the kernel absolutely has to do: Driver code can run in user space

map the physical address of the device in at a virtual address
handle interrupts and mask the hardware IRQ
deal with DMA, as this involves physical addressing of buffers

Everything else, all of the code to initialize the devices and all of the higher level handling in response to an interrupt, can go into a user space process where it will be easier to debug and maintain. The kernel driver needs to support an mmap() entry point, allowing the process to map in the hardware registers. Once mapped, the user process can program the hardware without ever calling into the kernel again.

Thoughts on Message Passing Designs

First, an assertion: RPC is a terrible way to implement a control plane. One of the advantages of having CPUs on each card is the ability to run operations in parallel, but using remote procedure calls means the CPUs will spend a lot of their time blocked. The control plane should be structured as a FIFO of operations in flight, without having to wait for each operation to complete. If information is needed from the remote card it should be structured as a callback, not a blocking operation.

It is tempting to implement the control communications as a series of commands sent from the supervisor CPU to the line cards. Individual commands would most likely be a high level operation, requiring the line card CPU to implement a series of accesses to the hardware. The amount of CPU time it takes for the supervisor to send the command would be relatively small compared to the amount of time the line card will spend implementing the command, likely accentuated by a significantly faster CPU in the supervisor. Therefore the supervisor will be able to generate operations far faster than the line cards can handle them. In networking gear this is most visible when a link is flapping [an ethernet link being established and lost very rapidly] where commands are sent each time to reconfigure the other links. If the flapping persists, you either cause a failure by overflowing buffers in the control plane or start making the supervisor block while waiting for the line card to drain its queue. Either way, its bad.

One technique to avoid these overloads is to have the supervisor delay sending a message for a short time. If additional operations need to be done, the supervisor can discard the earlier updates and send only the most recent ones. The downside of delaying messages in this way is that it is a delay, and responsiveness suffers.

Another technique involves a somewhat more radical restructuring. The line card most likely contains various distinct bits of functionality which are mostly decoupled. Falling back to my usual example of networking gear, the configuration of each port is mostly independent of the other ports. Rather than send a message describing the changes to be made to the port, have the supervisor send a message containing the complete port state. Because each message contains a complete snapshot of the desired state of the port, the line card can freely discard older messages so long as it implements the one most recently sent.

By structuring the control messages to contain a desired state, you allow the remote card to degrade gracefully under load. Under light load it can probably handle every message the supervisor sends, while under heavier load it will be able to skip multiple updates to the same state.

Note that the ordering of updates is lost, as events can be coalesced into a different order than that in which they were sent. This coalescing scheme can only work if the various bits of state really are independent, if one state depends on an earlier update to a different state then they are not independent.

Closing Thoughts

Gack, enough about control plane implementation already.

The comments section recently converted over to Disqus, which allows threading so the Gentle Reader can reply to an earlier comment. Anonymous comments are enabled for now, though if comment spam becomes a problem that may change.

Wednesday, July 2, 2008

gdb lies to you

Because I'm a terrible programmer, I spend a lot of time in gdb. gdb is a fantastic tool that has (so far) kept me from getting fired, but it has its limitations. Today, Gentle Reader, we will talk about one of the particularly vexing limitations.

Not Always Correct, but Never Uncertain
Have you ever opened a core and immediately spotted the problem: "Oh, foo() was called with a bad argument. That isn't a valid widget ID." So you dig around in the data structures which were used to call foo(), but you cannot see how it could possibly have been called with the argument gdb is showing?

You're right, the program did not call foo with a bizarre argument. gdb is displaying incorrect information. Lets take a simple example:

int testgdb()
{
    return foo(123);
}

int foo(int a)
{
    return bar(456);
}

int bar (int a)
{
    *((int *)NULL) = 0;  /* Deliberately crash */
    return a;
}

foo is called with a constant argument of 123. foo calls bar(456), and bar deliberately segfaults. Now lets examine the backtrace:

(gdb) bt
#0  0x80045a1c in bar (a=456) at foo.c:4
#1  0x80045a30 in foo (a=456) at foo.c:10
#2  0x80045a4c in testgdb () at foo.c:15

Weird, huh? How did foo get called with an argument of 456? The answer, of course, is that it didn't. It was called with 123 as its argument, and gdb is displaying incorrect information.

More MIPS Assembly, Please
Because it is one of my favorite techniques, lets disassemble the object code to see why this happens. This was compiled for MIPS32 using gcc -O2, but with inlining disabled (because we need to have function calls if we're going to talk about function arguments).

We'll start with testgdb():

<testgdb>:
addiu sp,sp,-24	push a stack frame
sw ra,16(sp)	store the return address to the stack
jal <foo>	jump to foo()
li a0,123	pass 123 as the first argument to foo (in delay slot)
lw ra,16(sp)	load return address back from the stack
jr ra	return to the caller
addiu sp,sp,24	pop the stack frame

MIPS passes arguments in CPU registers, and this is crucial to understanding why gdb sometimes displays the arguments incorrectly. Every CPU architecture has a set of "calling conventions" of how the compiler should pass arguments to functions. A long time ago in architectures like VAX and 680x0, arguments would be pushed on the stack. Since then CPUs have gotten dramatically faster while memory speed has improved less rapidly, so modern calling conventions pass arguments in registers whenever possible. The original calling conventions for x86 used the stack but the more recent "fastcall" passes two arguments in registers, while x86_64 uses registers for up to four arguments.

In the assembly example above the argument to foo is loaded into register a0, using a "load immediate" instruction. This instructions happens to be in a delay slot: with MIPS the instruction immediately following a branch is always executed.

<foo>:
addiu sp,sp,-24	push a stack frame
sw ra,16(sp)	save return address to the stack
jal <bar>	jump to bar()
li a0,456	pass 456 as first argument to bar (in delay slot)
lw ra,16(sp)	after returning from bar(), load return address
jr ra	return to the caller
addiu sp,sp,24	pop the stack frame

foo() pushes a stack frame and loads 456 into a0 in order to call bar(). The crucial thing to notice is that the current value of a0 was not saved anywhere before overwriting it with 456. The compiler determined that the original value was no longer required for any further computation, and did not need to be saved. So the "123" value stored in a0 has been irrevocably lost. foo next calls bar, and crashes.

In the gdb backtrace the function arguments will be displayed from the stack if they are available. Otherwise, gdb displays the values of the argument registers and hopes for the best. The arguments in the backtrace are mostly correct because gdb can often pull them from the stack. When compiled -O0 functions always save their argument registers to the stack, so it is only optimized code which can make debugging difficult in this way.

Et tu, gdb?

What can one do about this? Mainly be aware that what the debugger shows is not always correct. I believe this is good practice in general, always be a little bit suspicious of the tools.

Personally I'd find it more helpful if gdb were to display "<argument unknown>" rather than an incorrect value, if it can determine that the argument register has been re-used. I've tried digging through the gdb sources to see if this could be improved, but got lost in mips-tdep.c (reference first paragraph about me being a terrible programmer). Information about register re-use is typically not included in the compiler's debugging information anyway, so gdb may not have sufficient information to know when a register still holds the correct argument value.