Friday, January 13, 2012

Merchant Silicon

Last week Greg Ferro wrote about the use of merchant silicon in networking products. I'd like to share some thoughts on the topic.


Chip cost

Fistful of dollarsWe know the story by now: by selling chips into products from multiple networking companies, commodity chips sell in large volume and benefit from larger discounts. This is a compelling factor in the low end of the switching market, where margins are thin and a primary selling point is price.

Yet low price of the switch silicon is not a decisive factor in the midrange and high end of the switching market, where products are more featureful and sell at higher prices. The price of those products is not based on the cost of materials, its based on what the market will bear. The market has traditionally borne a lot: Cisco's profit margins in these segments have been legendary for a decade.

In my experience, chip price was not a decisive factor in the wholesale move to merchant silicon.


Non Recurring Engineering (NRE cost)

Silicon chipSay it costs $10 million to design a chipset for a high end switch, and the resulting set of silicon costs $500 in the expected volumes. If that high end switch sells 10,000 units in its first year then the NRE cost for developing it amounts to $1,000 per unit, double the cost of buying the chip itself. The longer the model remains in production the more its cost can be amortized... but the company has to pay the complete cost to develop the silicon before the first unit is sold.

In the midrange and high end switch markets, the strongest pitch made by merchant semiconductor suppliers wasn't the per-chip cost. A stronger pitch for those segments was elimination of NRE. The networking company didn't have to bear the cost of chip development up front. The company did pay the cost of development, but it would be factored into the unit price and pay-as-you-go rather than upfront.

Yet even NRE savings wasn't usually enough to convince a networking company to give up its own ASIC designs. Most realized that to do so was to give up a substantial portion of their ability to differentiate their products. Several vendors adopted a hybrid approach. They used merchant silicon to provide the fabric and handle simple cases of packet forwarding, and configured flow rules to steer interesting packets out to a separate device for additional handling. Costs were reduced by only having to design that specialized chip for a subset of the total traffic through the box, but they retained an ability to differentiate features.

In my experience, eliminating the burden of NRE was not a decisive factor in the move to merchant silicon.


Schedule

Gantt chartThe merchant silicon vendors of the world can dedicate more ASIC engineers to their projects. This isn't as big a win as it sounds: tripling the size of the design team does not result in a chip with 3x the features or in 1/3rd the time. As with software projects (see The Mythical Man Month), the increasing coordination overhead of a larger team results in steeply diminishing returns.

Instead, merchant silicon vendors have the luxury of working on multiple projects in parallel. They can have two teams leapfrogging each other, each working on a multiyear timeline and introducing their products in interleaving years. Alternately, they can target different chips at different market segments. They rely on their SDK to hide gratuitous differences which they happened to introduce, and only make their customers deal with the truly differentiating features of the different chips.

It is difficult to make a case to spend two years to develop custom silicon for a product when merchant silicon with sufficient features is expected to be available a year earlier. Merchant silicon suppliers share details of their roadmap very early, even before the feature set is finalized. This lets them incorporate feedback into the features for the final product, but they also do it to derail in-house silicon efforts.

Yet in my experience at least, though schedule is a decisive factor, this isn't the full story.


Misaligned Incentives

When leading a chip development effort, the biggest fear is not that the chip will have bugs. Many ASIC bugs can be worked around in software.

The biggest fear is not that the chip will be larger and more costly than planned. That is a negotiation issue with the silicon fab, or a business issue in positioning the product.

The biggest fear is that the chip will be late. Missing the market window is the worst kind of failure for an ASIC. The design team produces a chip which meets all requirements, but comes at a time when the market no longer cares. The tardy product will face significant pricing pressure on the day of introduction, more so the longer competitive products have been available.

The technical leadership of an internal ASIC project is therefore incented to plan a schedule which they are sure they can meet. They'll use realistic timelines for the different phases of the product, and include sufficient padding to handle unexpected problems. They will produce best case timelines as well, but those tend to be discounted by the project leadership as unrealistic.

The technical leadership inside merchant semiconductor companies face the same issues, and produce the same sort of schedule which they are confident they can meet. The difference is, that conservative schedule is not handed out to the decision makers at the customer networking vendors. A more optimistic schedule is maintained and presented to customers - not rosy best case, but certainly optimistic. Everybody knows that schedule will slip, even the customers themselves... but nonetheless it works. Customers work from the optimistic schedule because that is all they have. It increases the difference in schedule between in-house and merchant options by several quarters.


The Point of No Return

ASIC design requires some rather specialized skill sets. There is a great deal of similarity between chip design and software design, but not so much that one can switch freely back and forth. If there is not an active chip development effort underway, the ASIC team tends to run out of interesting things to work on.

When a company begins seriously contemplating building their high end products using merchant silicon, even if the management tries to keep it low key, it becomes pretty obvious internally. You have to pull in senior technical folks from the software, hardware, and ASIC teams to help with the evaluation. News spreads. Gossip spreads faster. If the ASIC team becomes convinced that there will be no further chip projects, they start to move on.

It can easily become a self-fulfilling prophecy: serious consideration of a move to merchant silicon leads to loss of the capability to develop custom ASICs.


Why it Matters

We talk a lot about Software Defined Networking. The term, consciously or not, tends to make people think the networking is all in software and the hardware is insignificant. That isn't actually true, as the SDN can only utilize actions which the hardware can actually do, but it illustrates how much less we value we put in hardware now.

In the context of SDN, reducing switch hardware diversity is actually a good thing. It results in a more uniform set of capabilities in networks, and a smaller set of cases for the SDN controller to have to handle. Networking used to be dominated by the hardware designs, but it has moved on now. I think that is a good thing.