Sunday, April 27, 2008

Four Circles of Product Management Hell

I typically work on software for hardware platforms, where the design cycle is considerably longer than that of a desktop or web application. 18 months for a new platform is not uncommon, and it is important to nail down the main feature set early enough in the process for the hardware capabilities to be designed in.

Nailing down these requirements is the job of the product manager, and at this point in my career I've worked closely with about a dozen of them. They generally approach the requirements definitions using a similar set of techniques:

  • visit current customers to find out what they like and dislike about the product, and what they need the product to do
  • visit prospective customers to get first impressions and feedback
  • work with the sales teams to break down barriers to a big sale (which can mean one-off feature development if the deal is big enough)
  • keep track of what the competition is doing
  • generally stay abreast of where the market is going

Over time my opinion of what constitutes "good product management" have evolved considerably. I'll give a few examples that portray what the younger, naive me thought of a particular style of product management versus what the older, cantankerous me thinks.



1. All requirements generated directly from customer requests

What I used to think:

"Wow, this is great. These are all features which obviously appeal to customers. We won't waste time on stuff that nobody will ever use, everything we develop will help sell the product."

What I now realize:

This is "Building Yesterdays Technology, Tomorrow!" Customers will ask for improvements and extensions to the existing functionality of your product, and that is fine. When a customer asks for something completely new, it is generally because they can already buy it from a competitor. In fact if the input is via an RFQ they are probably trying to steer the deal to that particular vendor by putting in a set of criteria which only that one vendor can meet. The competitor likely helped the customer put the RFQ together, as any good salescritter can supply a list of specifications which only their product can meet.

Certainly, there will always be a healthy amount of catch-up requirements in any ongoing product development. Competitors are not sitting idle, and they're probably not incompetent, so they will come up with things which your product will need to match. Any product manager will end up listing requirements which simply match what competitors already have.

However there must also be a significant amount of forward-looking development as well, building the product to meet anticipated features which the customer does not yet realize they need. Constantly playing catchup only works for companies in truly commanding (i.e. monopolistic) market positions.



2. Requirements list both externally visible behavior and internal implementation details

What I used to think:

"Umm, ok. Its a little weird that the product requirements are so detailed."

What I now realize:

The product manager believes themself to be a systems architect trapped in a product managers body. They almost certainly used to be an engineering individual contributor (many product managers come from a hardware or software engineering background). They likely feel they have such a firm grasp of the tradeoffs in developing the product that they can easily guide the engineers in how to proceed.

Also, and I am as guilty of this as anyone: engineers often have little respect for the product manager... and the reverse is true as well. The product manager may be trying to provide a top-level design for the product because they consider the engineering team to be incapable of doing it correctly.



3. Anticipates market needs with uncanny and unfailing accuracy, down to the smallest detail

What I used to think:

"Wow, this Product Manager really knows their stuff. This is great!"

What I now realize:

The company is doomed, because we're in a market which is too easy. We'll have five hundred competitors, who will all build exactly the same product and end up racing to the bottom of the price curve so nobody will make any money. Polish the resume.

The Gentle Reader might ask, "What about the product manager who can analyze a difficult and complex market to generate the perfect product requirements with 100% accuracy?" The answer is quite simple really: such a person does not exist. We developers very much want the paragon of product management to exist, someone who can predict the unpredictable and know the unknowable, in order to satisfy some deep-seated need in our world view. However there is no such thing as omniscience, the best product manager is one who can generate a market analysis which is good enough to get some initial traction and be refined in subsequent updates.



4. Product manager is often not in the office, and only revs the requirements document sporadically

What I used to think:

"Slacker."

What I now realize:

Actually, these are some of the signs of a really good product manager. When kicking off a new product they will spend a lot of time visiting customers and potential customers to gather inputs.

They apply a certain amount of skepticism, filtering and summarization to the raw inputs coming from outside the development environment, but without introducing too much delay. A product manager who passes on new inputs every time they meet with a customer or return from a trade show is not a good product manager, they're just a note taker. A product manager who suffers from analysis paralysis, not wanting to make a call until the correct course of action is absolutely clear, is also not a good product manager. There is a finite window of market opportunity, miss it and you'll find out from your competitors what the right thing to do was.

The best product managers know not to change the requirements too often, or the development team will never finish anything. So they will not be constantly releasing new versions of the requirements spec, and any changes which are made will have gotten enough validation to be credible without taking so long in validation as to be stale.

Unfortunately it is also possible that the reason they are not in the office and don't produce much is because they really are a slacker. Caveat emptor.



Afterthoughts

In fairness, I should acknowledge that the product manager job entails more than just defining the product. As they fit in a role equidistant between engineering, marketing and sales, the product manager is frequently pulled in to handle crises in any of the three areas.

  • as an expert user of the product, they do a lot of sales demos
  • they frequently end up training resellers (and customers) on the use of the product
  • they invariably get sent to all of the boring trade shows
  • they assist in preparing white papers, sales guides for particular vertical markets, etc

Finally, for a perspective from the other side of the fence I highly recommend The Cranky Product Manager. Entertaining, with a dash of truthiness.

Sunday, April 20, 2008

The Secret Life of Volatile

The C99 specification says the following about the volatile keyword (section 6.7.3):

An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. What constitutes an access to an object that has volatile-qualified type is implementation-defined.
 
A volatile declaration may be used to describe an object corresponding to a memory-mapped input/output port or an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be "optimized out" by an implementation or reordered except as permitted by the rules for evaluating expressions.

Now perhaps that explanation is sufficient for some of you to have the complete picture of when to use the volatile declaration. If so, you can stop reading now. You probably don't need to bother with any of my other articles, either.

Instead of trying to understand what volatile means by starting from the C language level, I find it more illustrative to look at the differences in the assembly code gcc generates. The gcc developers are the type of people who can read that specification and understand its ramifications. By looking at what the compiler produces, we can examine some of the problem areas which the volatile declaration can resolve.


 

Volatile Effect #1: Memory References

Let's start with a trivial example: a function which will busy-wait until a flag variable becomes non-zero. The flag variable would be set by another thread, or an interrupt or signal handler, to allow this function to break out of the loop.

int flag = 0;
void foo(void) {
    while (flag) {
    }
}

The resulting MIPS assembly is:

00: lw v0,0(gp) load address of flag into register v0
04: lw v0,0(v0) load value of flag into register v0
08: bnez v0,8>-< branch if nonzero, to 0x8 (to itself!)
0c: nop No-op, in branch delay slot
10: jr ra jump to return address (return to caller)
14: nop No-op, in branch delay slot

The instructions shown above will load from the flag variable exactly once, into a register. It will then loop until contents of that register become non-zero ... which cannot possibly happen because nothing else is ever loaded into the register. This seems silly, but the compiler is doing exactly what it has been told. In general when there is a loop, you want the loop condition to be stored in a register rather than fetched from memory. Otherwise every "for (i = 0; i < SOME_BIG_NUMBER; i++)" would result in useless memory references to repeatedly fetch the value of i.

This example has another interesting tidbit which will be important in some of the later examples: the branch delay slot. In order to run at higher clock frequencies, processor designs begin working on the next instruction before the previous instruction has completely finished. This is called pipelining. Branch instructions are particularly difficult to pipeline: it takes a while to figure out whether the branch will be taken or not, to figure out what the next instruction will be. Thus MIPS (and many other architectures) have a concept of a branch delay slot: the instruction immediately following the branch will always be executed, regardless of whether the branch is taken or not taken.

Now we'll examine the effect of adding a volatile modifier to the flag:

volatile int flag = 0;
void foo(void)
{
    while (flag) {
    }
}

The resulting MIPS assembly is:

00: lw v0,0(gp)<-+ load address of flag into register v0
04: lw v0,0(v0)  | load value of flag into register v0
08: bnez v0,0--+ branch if nonzero, to 0x0
0c: nop No-op, in branch delay slot
10: jr ra jump to return address (return to caller)
14: nop No-op, in branch delay slot

Now the loop branches back to load the value of flag from memory into the register for each iteration through the loop. If the value of flag is changed by some external entity, the loop will properly terminate. Actually, the loop not only reloads the value of flag from memory, it also reloads the address of flag into the register. Loading the address again is not necessary, I suspect it is the result of something specific to the MIPS backend.

These examples illustrate another interesting tidbit about branch delay slots: it is frequently difficult to find a suitable instruction for them. Not only does it have to be an instruction which will be executed whether the branch is taken or not, its result cannot be involved in deciding the branch as the branch instruction has already executed. In all delay slots in these examples the compiler had to use a NO-OP because it had no useful work to do.


 

Volatile Effect #2: Memory Access Ordering

Lets look at another example, this time involving a pointer to a hardware device:

typedef struct some_hw_regs_s {
    unsigned int words[4];
} some_hw_regs_t;

void foo(void)
{
    some_hw_regs_t  *regs = some_address;

    regs->words[2] = 0x0000dead;
    regs->words[0] = 0x0000f00d;
}

and the resulting MIPS assembly is:

00: li v0,0xf00d load 0xf00d into a staging register
04: li v1,0xdead load 0xdead into a staging register
08: sw v0,0(a0)  store 0xf00d to regs->words[0]
0c: jr ra  return to caller
10: sw v1,8(a0)  store 0xdead to regs->words[2], in branch delay slot

The problem here is more subtle: in the C code we set words[2] first. In the generated assembly, words[0] is set first. The compiler reversed the order of the memory operations. Why did it do so? I have no idea, it may be that storing to incrementing memory addresses is faster on some architectures.

When operating on data structures in RAM, changing the order of writes is generally harmless. Even with multiple threads running on independent processors there should be some form of locking to prevent other threads from seeing the structure in an inconsistent state. However if the structure is actually a memory-mapped hardware device, the order of accesses likely does matter. If the hardware registers are written in the wrong order the device may not function or, even worse, perform the wrong operation.

Now we'll examine the effect of adding the volatile modifier:

typedef struct some_hw_regs_s {
    unsigned int words[4];
} some_hw_regs_t;

void foo(void)
{
    volatile some_hw_regs_t *regs = some_address;
    regs->words[2] = 0x0000dead;
    regs->words[0] = 0x0000f00d;
}

The resulting MIPS assembly is:

00: li v0,0xdead load 0xdead into a staging register
04: li v1,0xf00d load 0xf00d into a staging register
08: sw v0,8(a0)  store 0xdead to regs->words[2]
0c: jr ra return to caller
10: sw v1,0(a0)  store 0xf00d to regs->words[0], in branch delay slot

Now words[2] is written first, exactly as intended.


 

Volatile Effect #3: Write coalescing

One final example, again involving a pointer to a hardware device:

typedef struct some_hw_regs_s {
    unsigned int words[4];
} some_hw_regs_t;

void foo(void)
{
    some_hw_regs_t *regs = some_address;

    regs->words[2] = 0x0000dead;
    regs->words[2] = 0x0000beef;
    regs->words[2] = 0x0000f00d;
}

We're writing three different values in quick succession to the same memory location. This would be a weird thing to do when writing to memory, but when writing to a memory-mapped hardware device it is quite common. Many hardware devices have a command register, which the software writes to for every operation it wishes to initiate.

00: li v0,0xf00d load 0xf00d into register
04: jr ra return to caller
08: sw v0,8(v1) store 0xf00d to memory, in branch delay slot

There is only one store instruction, for 0xf00d. The compiler determined that the first two writes were useless as they are overwritten by the final one, so it optimized them away. This would be fine if writing to memory, but would cause great consternation in a hardware device driver.

As you can probably anticipate by now, adding the volatile modifier to the pointer will resolve the issue:

typedef struct some_hw_regs_s {
    unsigned int words[4];
} some_hw_regs_t;

void foo(void)
{
    volatile some_hw_regs_t *regs = some_address;
    regs->words[2] = 0x0000dead;
    regs->words[2] = 0x0000beef;
    regs->words[2] = 0x0000f00d;
}

The resulting MIPS assembly is:

00: li v0,0xdead load 0xdead into register
04: sw v0,8(a0) store 0xf00d to regs->words[2]
08: li v1,0xbeef load 0xbeef into register
0c: li v0,0xf00d load 0xf00d into another register
10: sw v1,8(a0) store 0xbeef to regs->words[2]
14: jr ra return to caller
18: sw v0,8(a0) store 0xf00d to regs->words[2], in branch delay slot

With a volatile modifier all three values will be written to regs->words[2], in the order they appear in the C code.


 
The Limits of Volatile

A volatile modifier tells the compiler not to optimize away reads or writes to the variable, but there are other aspects of the system you need to be aware of when accessing hardware devices. For example, the CPU data cache should be bypassed when reading from a hardware device, as stale data cached from a previous read is generally not useful. Marking a pointer as volatile does not automatically make it non-cacheable. The compiler has no control over whether a particular address is cacheable or not, this must be specified when creating the memory mapping. For Linux systems, the kernel driver which exports the mmap() entry point has to create a noncacheable mapping.

Similarly, most modern system designs contain write buffering between the CPU and its memory bus. The CPU can perform a write and continue to subsequent instructions before the write has actually made it all the way to its destination. Adding the volatile modifier does not automatically insert the appropriate memory barrier instructions, these must be handled manually by the developer.


 
Update

Commenting on reddit, dmh2000 suggested two other articles on this topic for your edification and bemusement. I found them both interesting, so I'll pass them on here as well:

I also updated the formatting of this article, because it looked terrible in the RSS feed. I'm still figuring out this newfangled CSS thing all the youngsters seem to like.

Update 2/2009: Commenting on a different article, Tom Ziomek pointed out an excellent paper to be published in the Proceedings of the Eighth ACM and IEEE International Conference on Embedded Software. Extensive testing of the volatile keyword against a number of different C compilers shows that very often, incorrect code is generated. Proceed with caution.