Coding Relic: Product Development

Showing posts with label Product Development. Show all posts

Friday, September 26, 2025

On the Nature of Microservices

a series of interlocking sawtoothed gears, intended to represent a set of microservices

Microservices entered the lexicon of systems engineering a number of years ago, breaking up a system which might once have been delivered as a single large binary — though usually with separate relational database at least — to instead consist of a series of small microservices each performing a specific function and communicating amongst themselves to provide an overall service.

In my experience at least, moving to a microservices architecture does solve real problems but the nature of the problems it solves are as much organizational as technical.

In a very large engineering team all trying to work together to deliver a solution, one frequently loses velocity simply because of the number of teams jostling against each other:

The release process grows over time, and tends to add process of the form "make sure that problem never happens again."
Rollbacks roll back everything. When every team has P0 deliverables, no team has P0 deliverables.
The internal architecture might have been carefully planned... or might not. Even without a "if I can call it, I can use it" mentality, there still can be significant underspecification.

Moving to Microservices

A team which moves to microservices, devoting considerable effort to do so, has every incentive to solve these issues:

The release process for each microservice can be (mostly) decoupled with disciplined versioning of API between producers and consumers.
Rollbacks impact only a portion of the overall service.
The interfaces between microservices will have an API boundary, though Hyrum's Law still applies that un-promised behavior can become a load-bearing dependency.

The tradeoff is that one now has a distributed system, and distributed systems are hard.

Debugging is much more likely to cross multiple processes.
The 95% and 99% tail latency will suffer, as long code paths likely incur extra cross-process messaging.
A robust design will degrade if subsystems are unavailable, transforming the potential for outage into more frequent partial systems failure.

If the problem being solved is "maximize effectiveness of a large engineering team" then these are perhaps good tradeoffs. One has the resources to develop tooling for debugging and latency and reliability. If one has further chosen to deploy those services via an orchestration system like Kubernetes, one has the personnel to cover the operational burden of that too.

If one doesn't have the personnel to cover the cost, microservices become a more questionable choice. Fundamentally: delivering via monolithic binary is not inherently bad. It isn't automatically a poor choice. It isn't poor engineering.

How to deliver software is influenced by the size and composition of the team delivering it.

Friday, January 13, 2012

Merchant Silicon

Last week Greg Ferro wrote about the use of merchant silicon in networking products. I'd like to share some thoughts on the topic.

Chip cost

Fistful of dollars We know the story by now: by selling chips into products from multiple networking companies, commodity chips sell in large volume and benefit from larger discounts. This is a compelling factor in the low end of the switching market, where margins are thin and a primary selling point is price.

Yet low price of the switch silicon is not a decisive factor in the midrange and high end of the switching market, where products are more featureful and sell at higher prices. The price of those products is not based on the cost of materials, its based on what the market will bear. The market has traditionally borne a lot: Cisco's profit margins in these segments have been legendary for a decade.

In my experience, chip price was not a decisive factor in the wholesale move to merchant silicon.

Non Recurring Engineering (NRE cost)

Silicon chip Say it costs $10 million to design a chipset for a high end switch, and the resulting set of silicon costs $500 in the expected volumes. If that high end switch sells 10,000 units in its first year then the NRE cost for developing it amounts to $1,000 per unit, double the cost of buying the chip itself. The longer the model remains in production the more its cost can be amortized... but the company has to pay the complete cost to develop the silicon before the first unit is sold.

In the midrange and high end switch markets, the strongest pitch made by merchant semiconductor suppliers wasn't the per-chip cost. A stronger pitch for those segments was elimination of NRE. The networking company didn't have to bear the cost of chip development up front. The company did pay the cost of development, but it would be factored into the unit price and pay-as-you-go rather than upfront.

Yet even NRE savings wasn't usually enough to convince a networking company to give up its own ASIC designs. Most realized that to do so was to give up a substantial portion of their ability to differentiate their products. Several vendors adopted a hybrid approach. They used merchant silicon to provide the fabric and handle simple cases of packet forwarding, and configured flow rules to steer interesting packets out to a separate device for additional handling. Costs were reduced by only having to design that specialized chip for a subset of the total traffic through the box, but they retained an ability to differentiate features.

In my experience, eliminating the burden of NRE was not a decisive factor in the move to merchant silicon.

Schedule

Gantt chart The merchant silicon vendors of the world can dedicate more ASIC engineers to their projects. This isn't as big a win as it sounds: tripling the size of the design team does not result in a chip with 3x the features or in 1/3rd the time. As with software projects (see The Mythical Man Month), the increasing coordination overhead of a larger team results in steeply diminishing returns.

Instead, merchant silicon vendors have the luxury of working on multiple projects in parallel. They can have two teams leapfrogging each other, each working on a multiyear timeline and introducing their products in interleaving years. Alternately, they can target different chips at different market segments. They rely on their SDK to hide gratuitous differences which they happened to introduce, and only make their customers deal with the truly differentiating features of the different chips.

It is difficult to make a case to spend two years to develop custom silicon for a product when merchant silicon with sufficient features is expected to be available a year earlier. Merchant silicon suppliers share details of their roadmap very early, even before the feature set is finalized. This lets them incorporate feedback into the features for the final product, but they also do it to derail in-house silicon efforts.

Yet in my experience at least, though schedule is a decisive factor, this isn't the full story.

Misaligned Incentives

When leading a chip development effort, the biggest fear is not that the chip will have bugs. Many ASIC bugs can be worked around in software.

The biggest fear is not that the chip will be larger and more costly than planned. That is a negotiation issue with the silicon fab, or a business issue in positioning the product.

The biggest fear is that the chip will be late. Missing the market window is the worst kind of failure for an ASIC. The design team produces a chip which meets all requirements, but comes at a time when the market no longer cares. The tardy product will face significant pricing pressure on the day of introduction, more so the longer competitive products have been available.

The technical leadership of an internal ASIC project is therefore incented to plan a schedule which they are sure they can meet. They'll use realistic timelines for the different phases of the product, and include sufficient padding to handle unexpected problems. They will produce best case timelines as well, but those tend to be discounted by the project leadership as unrealistic.

The technical leadership inside merchant semiconductor companies face the same issues, and produce the same sort of schedule which they are confident they can meet. The difference is, that conservative schedule is not handed out to the decision makers at the customer networking vendors. A more optimistic schedule is maintained and presented to customers - not rosy best case, but certainly optimistic. Everybody knows that schedule will slip, even the customers themselves... but nonetheless it works. Customers work from the optimistic schedule because that is all they have. It increases the difference in schedule between in-house and merchant options by several quarters.

The Point of No Return

ASIC design requires some rather specialized skill sets. There is a great deal of similarity between chip design and software design, but not so much that one can switch freely back and forth. If there is not an active chip development effort underway, the ASIC team tends to run out of interesting things to work on.

When a company begins seriously contemplating building their high end products using merchant silicon, even if the management tries to keep it low key, it becomes pretty obvious internally. You have to pull in senior technical folks from the software, hardware, and ASIC teams to help with the evaluation. News spreads. Gossip spreads faster. If the ASIC team becomes convinced that there will be no further chip projects, they start to move on.

It can easily become a self-fulfilling prophecy: serious consideration of a move to merchant silicon leads to loss of the capability to develop custom ASICs.

Why it Matters

We talk a lot about Software Defined Networking. The term, consciously or not, tends to make people think the networking is all in software and the hardware is insignificant. That isn't actually true, as the SDN can only utilize actions which the hardware can actually do, but it illustrates how much less we value we put in hardware now.

In the context of SDN, reducing switch hardware diversity is actually a good thing. It results in a more uniform set of capabilities in networks, and a smaller set of cases for the SDN controller to have to handle. Networking used to be dominated by the hardware designs, but it has moved on now. I think that is a good thing.

Thursday, May 19, 2011

Perspective on a Hot Technology

An observation for those working in a hot technology area.

Hype Cycle leading to acceptance or deadpool

An inflection point is coming. Your actions at the low point influence what happens after.

Tuesday, May 17, 2011

Just Weld It to the Rack

Network equipment chassis There is a mantra amongst network equipment vendors: never let the customer unscrew the chassis from the rack. Replace every component with upgraded equipment, but never upgrade the chassis itself. Doing so often triggers the customer to send the whole thing out for bid to multiple vendors, and you might not win. Somebody else's chassis might get slotted into the hole you left.

So vendors work very, very hard to keep the old chassis competitive. Line cards and supervisor modules will be updated repeatedly over the lifetime of the system. Though the design of the backplane might also be upgraded in new systems, the new cards have compatible modes of operation for even the oldest deployments.

This has a couple impacts:

It favors passive backplanes. Active electronics can't be upgraded, future generations of cards find it a hinderance.
It means backplanes are overdesigned, and therefore expensive. Even if you plan to clock traces for 10 Gbps operation, you make the PCB handle 25 or 40 Gbps to support future upgrades. You run as many traces as will fit, even if they won't be used by the first gen cards.

If you ever see a backplane PCB removed from the sheet metal, notice how thick it is. That isn't just for stiffness over the long span of board, backplane PCBs have many signal layers. PCB fabrication gets expensive at 20 layers, but backplanes often have 26 or more. Some of those layers end up using more expensive PCB material, needing better electrical characteristics.

Over the weekend Greg Ferro wrote about a prototype optical backplane from HP on display at Interop, which manages 120 Gbps per optical tap. I think that results in 120 Gbps per slot, though I'm not sure. Optical backplanes have been in development for a long time (consider this announcement from 2005), and even the HP design is 3-5 years away from showing up in a product.

Nonetheless the potential is clear: this is a 10 year backplane. Though the waveguides are plastic not glass, over the short distance of a chassis they should have a vastly larger useable spectrum than copper traces. Subsequent generations of cards can use faster lasers and/or more colors to get the bandwidth they need. Optics (and especially multiple wavelengths) over the backplane adds cost, but the future proofing is worth it.

footnote1: Greg Ferro also hosts the Packet Pushers podcast, which I highly recommend.

footnote2: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Ethernet and Product Development labels.

Wednesday, May 12, 2010

Death of Copper Predicted. Film at 11.

copper RJ45 and fibers held in a hand Every handful of years we ratchet up the Ethernet link speed: from 10 Mbps to 100 Mbps in the early 1990s, to 1 Gbps in the mid 1990s, to 10 Gbps in the early part of this century. 40 Gbps is the next target. At the 1 Gbps and 10 Gbps transitions naysayers maintained that copper cables would never be able to meet the required signaling rates and that optical would prevail. The same doubt is now being voiced about 40 Gbps.

During the 1 Gbps and 10 Gig transitions, optical media became available several years before copper, and then the initial 10 Gig copper specs were limited to patch cable distances of 10-15 meters. 40G will repeat the story with optical products already available, substantially before copper. Nonetheless I'd wager 40G copper transceivers will eventually appear in some form.

Yet this time, optical will win. Not because of the technology or limitations of copper wire, but because of economics. Economics used to be in copper's favor: simple install and no expensive lasers. Copper could ride the silicon technology curve, throwing ever more DSP power at the problem. Times have changed: cat6a and cat7 cabling is as difficult and expensive to install as fiber, and solid state laser components allow optical transports to ride the silicon technology curve.

Like fiber, cat6/7 cables have a minimum bending radius. Pull too tight and the cable can no longer handle long distances.
Like fiber, cat7 does not tolerate being stretched. Stretch a 100m cable by a centimeter and its performance suffers.
Even padded cable staples put too much pressure on the cable. cat7 must run in a tray or conduit, and the bulky shielding means fewer of them will fit.
cat7 cables are very sensitive to connectorization. The crimp tool you used for cat5e won't do.

The other problem with copper cables is that they are made of copper, an actively traded commodity. The chart below shows the raw material cost of copper over the last century, normalized to the US Dollar in 1998. During much of the late 1990s and early 2000s copper was cheap by historic standards. In the last few years the commodity price has trended back up due to demand, without a matching increase in new supply. If there is a natural ceiling for copper pricing where the market will seek alternatives, we do not appear to have hit it yet.

Price of copper since 1900 in 1998 dollars

(data source: US Geological Survey)

I'm not predicting that 40 Gig copper transceivers will be impossible. On the contrary, I suspect there will be two solutions brought to market: a very short reach spec using RJ45 patch cables, and a 100m spec which imposes more painful requirements like cat7a/cat8, use of multiple cables, and electrically better connectors (presumably also manufactured, not connectorized on site). These products will eventually appear, substantially lagging optical product availability.

I simply suspect that the economics no longer work in coppers favor: patch cables from one side of the rack to the far corner will be long enough to have to worry about install quality. If the pressure from zip-ties fastening the cable to the rack threaten the operation of your network, you're better off using fiber.

The genesis of this post came as a comment on Stephen Foskett's excellent Pack Rat blog. It is an excellent resource, highly recommended.

Friday, April 2, 2010

Distribution Schmistribution

Avnet Logo Electronics distribution giant Avnet recently announced it would acquire Bell Microproducts. The deal values Bell at $250 million after paying off debt. Bell's annual sales are north of $3 billion, but electronics distribution is not a high margin business.

Some of the commentary surrounding the announcement expresses a lack concern about consolidation amongst distributors, because the chip manufacturers have firm control of pricing. I'm not sure that faith is warranted: distributors lack the ability to control pricing because there is so much fragmentation in the market. Each manufacturer has multiple channels for distribution and can play them off against one another. As the number of strong distributors dwindles, that power shifts to the remaining middlemen.

Distributors regularly show they will take complete advantage of any leverage they get. For example, chip manufacturers give preferential terms to a distributor who wins the design-in at a particular customer: no other distributor will be allowed to offer better prices to that customer. This is reasonable... until the distie takes it to the next level. Distributors do a great deal more business with contract manufacturers than almost any individual customer. A distributor can legitimately win the design for a particular chip, then request the bill of materials from the CM. In almost all cases the CM will give it to them. The distributor then registers as winning the design for every component on the BOM, even those they had nothing to do with. As a customer you suddenly find yourself unable to obtain better pricing on anything in your product, based on a single deal for one component.

If consolidation amongst distributors gives those which remain significantly more power over pricing, does anyone think they won't abuse it?

Friday, February 5, 2010

Apple == A, Plus Four More Letters

What the web needs right now is another blog post about the iPad.

Apple A4 chip No, don't run away! This will be different, I promise. We'll focus on Apple's A4, a custom CPU first used in the iPad. It has been widely assumed that A4 uses a licensed ARM Cortex A8 core. I have no reason to dispute this assertion, it seems like a fine choice. It has also been asserted that because the same core is used in parts from Samsung, TI, and Qualcomm, Apple should not have bothered making its own chip. Today, Gentle Reader, we'll explore that notion a bit.

Apple ASIC Expertise

Apple has a long history of ASIC design. Apple produced custom silicon for various Macintosh models since at least the late 1980s, when they designed the audio chips used in the Quadra 700 and 900 (a chip called "Batman"). Later, Apple designed entire chipsets to interface with the PowerPC 60x bus. Apple licensed a gigabit Ethernet MAC design from Sun, and used it plus IDE controller and other peripherals in chipsets for several Powermac models. With the switch to x86, Apple's efforts became much more constrained. The x86 bus interface is difficult to license, and Intel's own chipsets are quite reasonable. So far as I know, x86 Macintoshes no longer use custom Apple ASICs.

Custom chip design isn't a radical departure for Apple.

With that as background, what might Apple have done in the A4 chip? I have absolutely no inside information about the iPad or A4 processor, I'm going to make stuff up because its fun to speculate.

Graphics & OpenCL

PowerVR SGX CPU Overview Apple holds a nearly 10% stake in Imagination Technologies Group, which designs the PowerVR graphics accelerator and other IP relating to massively threaded processing. Apple uses their PowerVR SGX 535 in iPhone 3GS, and used various PowerVR graphics in earlier iPhone models. The A4 chip will certainly integrate a graphics core from PowerVR. As with essentially all GPU designs today, the PowerVR makes use of multiple, specialized CPU cores. There is relatively little information about its instruction set on the web, ~~only that it is called META MTX and uses 16 bit RISC-ish instruction words~~. Update: PowerVR SGX does not use the META architecture, it has a distinct architecture of its own. Additional information can be downloaded after registration.

Apple has also invested heavily in two relevant technologies: OpenCL and LLVM.

OpenCL allows processing to be distributed across multiple CPUs in the system, even if they have different instruction sets. OpenCL algorithms are written in a language with syntax very similar to C99, and the framework handles the rest.
LLVM is a compiler toolkit, one aspect of which is a machine independent instruction set. Source code can be compiled to the LLVM virtual machine, and from there be translated into the equivalent opcodes for the target CPU. The compilation can be done statically before running it, or by a Just-In-Time compiler while interpreting the LLVM bytecodes.

iPhone applications are compiled to ARM instructions, but it is not much of a stretch to imagine support for sections of LLVM bytecodes as well. If the hardware has sufficient GPU power, the bytecode could be translated to the GPU instruction set and offloaded. Devices with less sophisticated GPUs would use the ARM instead. Apple does not allow iPhone apps to include their own virtual machine in this way, but would be free to provide the VM function as part of the OS.

I suspect this is the most compelling reason for Apple to build its own chip as opposed to buying off the shelf. The rest of the mobile industry is satisfied to offload 3D graphics and video decoding to the GPU. Apple has greater ambitions, and could make use of significantly more GPU pipelines. By controlling the complete platform from CPU to software, Apple can make tradeoffs which are not practical for the rest of the market. For example: a very large GPU plus very fast ARM would generate more heat than can be dissipated in a small form factor like a phone. Apple has the option to dynamically throttle the ARM clock speed in order to open up more thermal envelope for the GPUs, if sufficient OpenCL workload is ready to run. When the GPUs are less busy, the ARM clock speed can be brought back up.

Multi Package Modules

The CPU in the iPhone 3GS is a Samsung S5PC100. This is a multi package module with CPU, I/O chip, and SDRAM sandwiched tightly together. Multi chip modules have been around for a long time, where multiple dies wired together in one big package. The amount of testing which can be done on a raw die is rather limited, so MCM yields suffer as one bad die ruins the whole assembly.

Multi package modules are relatively new: each chip is in its own package, but use very tight pin spacings and do not have a heat spreader. They are soldered together on a small PCB, which in turn has a Ball Grid Array on the bottom with normal pin spacings. Because each chip is packaged separately a full suite of test vectors can be run before the final assembly is put together, improving the yield considerably and lowering the cost of the final product.

If we examine the main board of an iPhone 3GS, the largest component is not the Samsung processor - it is the Flash memory, an MPM containing a number of flash chips. The Samsung CPU in the iPhone 3GS comes in a close second in size, and is also a multi package module with CPU, I/O chip, and SDRAM.

With the A4, Apple will probably have one die containing both CPU and I/O. Samsung uses different I/O chips to tailor their offering to many market segments, which is not a goal for Apple. By arranging the pinout carefully, Apple might be able to make an MPM containing CPU, SDRAM, and Flash, reducing the total board area. Different Flash capacities could be offered by not stuffing portions of the MPM. The iPad itself might not need such an MPM as it is a much larger device, but future iPhones would benefit more.

To be clear: assembling an MPM is not something you can easily do when buying merchant silicon. The pins on one package have to be arranged so as to be easy to route to the pins on the other packages within the MPM.

Wrapping up

I'll say it again: I made this all up. I have no information on the specifics of the A4, just speculation. In the unlikely event that anyone reads this (instead of running away from yet another iPad blog post), don't copy it into Wikipedia as though it were verified information.

What about future iterations? Its tempting to consider a single chip containing the entire iPhone feature set, including radio and wireless networking. The A4 itself clearly doesn't do this, as GSM support is optional in the iPad. I suspect that even in future chips, Apple won't pull in the baseband radio. The front end portions of that chip are rather sensitive to noise, and generally don't work well when integrated in the corner of a gigantic ASIC. Also integrating the radio functionality would make it that much harder to keep up with advancements in wireless networks.

Another future possibility is to use this chip in Apple's other small form factor products, like the Airport Base Stations, Time Capsule, and AppleTV. This is certainly possible, but aside from obvious additional peripherals like SATA I'm not sure it adds many requirements to the chip.

Other articles about A4 you might find interesting:

The New York Times writes about the history of the A4 design team.
Louis Gray writes about Apple's heavy recruiting push to staff up their ASIC team.
Wikipedia already has an article, which will improve over time as more details emerge.

P.S.: While we're at it, the title of this post is a guess about the origins of the "A4" nomenclature: "Apple" is a capital A followed by four more letters.

Sunday, October 11, 2009

A Cloud Over the Industry

Unbelievable:

Regrettably, based on Microsoft/Danger's latest recovery assessment of their systems, we must now inform you that personal information stored on your device - such as contacts, calendar entries, to-do lists or photos - that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger. That said, our teams continue to work around-the-clock in hopes of discovering some way to recover this information. However, the likelihood of a successful outcome is extremely low.

This kind of monumental failure casts a cloud over the entire industry (pun intended). How did it happen? I do not believe Danger could have operated without sufficient redundancy and without backups for so many years, it simply does not pass the smell test. There must be more to the story.

On possible theory for unrecoverable loss of all customer data is actually sabotage by a disgruntled insider. This is, itself, pretty unbelievable. Nonetheless according to GigaOM, when Microsoft acquired Danger "...while some of the early investors got modest returns, I am told that the later-stage investors made out like bandits." I wonder if the early employees also made only a modest payout in return for their years of effort, and had to watch the late stage investors take everything. Danger filed for IPO in late 2007, but was suddenly acquired by Microsoft in early 2008. What if the investors determined they would not make as much money in a large IPO as they would in a carefully arranged, smaller acquisition by Microsoft? For example, the term sheet might have included a substantial liquidation preference but specify that all shares revert to common if the exit is above N dollars. For the investors, the best exit is (N-1) dollars; they maximize their return by keeping a larger fraction of a smaller pie. If they had drag along rights, they could force the acquisition over the objections of employees and management. If the long-time employees watched their payday be snatched away... a workforce of disgruntled employees seems quite plausible.

This is all, of course, complete speculation on my part. I simply cannot believe that the company could accidentally lose all data, for all customers, irrevocably. It doesn't make sense.

Monday, August 24, 2009

Plummeting Down the Chasm

Crossing the Chasm book cover Crossing the Chasm is a seminal book in technology marketing, whose ideas quickly spread through the industry. Originally written in 1991 by Geoffrey Moore, it showed a new take on the technology adoption lifecycle. The lifecycle starts with tech enthusiasts willing to buy an immature product and runs through the majority buyers who make up the bulk of the market, finally trailing off when market saturation is reached. It had been commonly depicted as a bell curve:

Moore's key observation is that while the bell curve implies there is a smooth transition from early market to majority, in reality the buyers in the early market are fundamentally different from the majority that comes later. Technology companies who don't appreciate this gap will stumble and often fail once they saturate the small pool of early buyers. Moore referred to this gap as "the chasm."

Early adopters are visionaries. They are willing to look at an immature product and figure out how to use it in their operations to get a competitive advantage. They will sponsor integration work within their IT departments, and generate long lists of product feedback to better fit their needs. They are fundamentally different from the majority buyers in that they will look at an interesting product and figure out what problem it can solve. The majority market comes from the opposite direction with a problem to solve, looking for a solution. Product planning and marketing which works in the early part of a product lifecycle will fail utterly later in the game.

Yon Chasm Approacheth

It is quite possible to get stuck in the early market, continuously trying to meet the needs of early adopters and never enjoying the big sales of the majority market. Having spent the last several years in this predicament, I'll offer my version of the lifecycle chart. What it lacks in precision, it makes up for in snark.

As a development engineer it can be difficult to tell how well the sales cycle is working as one rarely gets direct visibility, but the indirect evidence is plentiful.

If nearly every deal comes with a list of new product features to be implemented, you have not crossed the chasm.
If in every medium to large deal it is not clear whether the cost of getting the business is greater than the revenue it would bring, you have not crossed the chasm.
If every deal is "high touch," requiring multiple visits by a salesperson and sales engineer and possibly a consultation with the development team, you have not crossed the chasm.
If your product cannot be sold via a web site but instead always requires an evaluation period and report, you have not crossed the chasm.
If every customer is using a different subset of the product functionality, you have not crossed the chasm.

This is important: if the company does not realize that the real problem is in the approach to the market, all of these things will be blamed on the product. The reasoning will be that "if we just pound out a few more of these deals, we'll have finally implemented everything that everybody wants and sales will take off." There may even be an element of truth in this sentiment, if the product shipped early before its natural feature set was complete. However if the company has not crossed the chasm, the fundamental problem is elsewhere.

You are not getting a list of feature requirements because the product is incomplete, but because you are selling to the type of customer who generates lists of requirements.

You are getting that list of requirements because you are still selling into the visionary and early adopter segments of the market, the people who are willing to think about how best to integrate the product into their operations. If you were selling into the mainstream market there would be no list of requirements because the mainstream won't do an extensive integration on their own. The product either meets their needs or it doesn't, and you'll either get the sale or you won't. In the mainstream there will be no back and forth of what the product could do to win the business. At most, the mainstream buyer might tell you why you lost the business.

Don't get stuck wandering around in the chasm. Trust me, it sucks.

Friday, August 21, 2009

Smartbooks and Handheld PCs

Intel markets the Atom processor for netbooks. It trades lower processing power for very low power consumption, and is quite inexpensive in large quantities. These are product features where ARM/PowerPC/MIPS have long focussed, though they have aimed at non-PC form factor devices. Now, a new round of small laptops is hitting the market using non-x86 processors with either Windows CE or a Linux software stack like Google's Android or, eventually, Chrome OS. Most notably, Dell appears to be on the verge of introducing such a device - the true measure of whether a product category has entered the mainstream. These devices are generally called Smartbooks, owing to Qualcomm's extensive marketing push for its ARM Snapdragon chips in such a role.

Vadem Clio CL-1050 What strikes me about these devices is that we've been down a similar road before. In the late 1990s there was a flurry of activity around HandHeld PCs, which were relatively small and inexpensive compared to the laptops of the day. They typically used MIPS processors, ran Windows CE, and supplied basic word processing and communications software. Handheld PCs didn't last long on the market, and Microsoft ceased work on the software after just two years. The devices simply were not useful enough when compared to a full laptop.

This is another example of how infrastructure and market can make more of a contribution to the success of a new product as its design. If a product is too early, it won't get enough traction to drag the rest of the environment along. In 2009 these devices can depend on a fast wireless infrastructure and plentiful cloud-hosted applications, which did not exist in 1999. The world has changed.

Wednesday, April 1, 2009

The Target Market is Not Obvious

munchkin bottle warmer Exhibit A: a bottle warmer by munchkin, a company focussed mainly on baby products. You pour water into the base, put the bottle in the holder, and press the button. A heating element boils the water to steam, which warms the bottle.

Exhibit B: a piezoelectric buzzer. When the bottle is done heating, the lighted button on the front changes from red to green and the product emits a series of four shrill beeps to announce its successful completion. It is obviously very proud of itself.

The issue, and point of today's rant? When using the bottle warmer you are likely holding a squirming, ravenously hungry baby who really isn't interested in the details of the food preparation equipment. You don't need an audible alarm, as you'll be staring desperately at the warmer ready to snatch the bottle out the instant it has made it from "cold" to "tepid." So at best, the buzzer is annoying. With twins this feature is actively harmful: you really, really don't want to wake the second baby until the first is done eating. Trust me on this one.

PCB showing removed buzzer Exhibit C: the empty space on the PCB where the buzzer used to be.

I have to say, I found their user interface to disable the buzzer feature somewhat difficult: a Torx screwdriver and soldering iron.

Who was this thing designed for anyway, and why did they feel a buzzer was necessary? One can only speculate about the product requirements document and user story which led to this feature...

This is Evilyn, a new parent.

When the baby is hungry she puts a bottle in the warmer, turns on the vacuum cleaner to drown out the crying, and goes to watch a little TV. When the bottle is heated there needs to be an indication noticeable despite both the vacuum cleaner and Oprah.

I'm kidding of course, I don't believe there was ever any such product plan. I think the problem is somewhat more subtle: the bottle warmer wasn't really designed for the parent at all — and no, it wasn't designed for the baby either.

Winning the Business

The munchkin corporation might well design some of its products, but as with any large company they would face the "build versus buy" decision for each niche product filling out their lineup. Their main concern is marketing and brand management. So the bottle warmer was quite possibly designed and manufactured by another company, and that other company does not sell directly to parents. Though the product obviously needs to perform its function, their customer is not the end user of the bottle warmer. Their customer is the buyer at the munchkin corporation. They might be designing it under contract or, quite possibly, designing it speculatively and hoping it will be picked up by one or more baby product companies. They have to make a good pitch, or be the most impressive offering at the trade show, to get the buyer to pick them instead of some other supplier.

In that sort of competition, a longer feature list can be the key to winning the business. The munchkin bottle warmer has several extra features: it can sterilize pacifiers and bottle nipples using steam, it has a lighted button which changes from red to green during operation, and it has the aforementioned infernal buzzer.

So there you have it. When you are vaguely dissatisfied with something you bought, if it just doesn't seem to have been completely thought through, this might be the reason. The designer's target was not you. It was targeted to catch the attention of the buyer for another company. This is also a primary reason why Apple has been so successful in consumer electronics: though they might buy components and software from outside, they never pick up a complete product to slap their logo onto. They retain control of the product design.

Wednesday, September 3, 2008

The Good, Bad, and Ugly of Gizmo Construction

Rather by definition, embedded software involves building a gizmo of some sort. Manufacturing the hardware portion of the gizmo turns out to be somewhat more complicated than writing a Makefile and starting the build... who knew?

Today, Gentle Reader, we'll discuss the realities of building hardware products by meandering through a few topics:

Contract Manufacturing
Distributors
Contract Engineering
DVT and Compliance

Contract Manufacturing

Contract manufacturers build a product according to a completed design including the Bill of Materials, the layout of the printed circuit board, assembly instructions, system tests, etc. They can do as much or as little as desired, from a single board up to a completed and tested system directly shipped to your customer.

In the last 25 years contract manufacturing has essentially taken over the production of all electronic goods. The economies of scale from producing such large volumes are overwhelming, and there are very few companies who maintain their own production facilities now. The capital expenses for kitting out an assembly line are daunting, the CM can amortize the cost over a huge number of products. Some well known contract manufacturing firms are Foxconn (also known as Hon Hai), Flextronics (which owns Solectron, another large CM), Celestica, and Sanmina-SCI.

Contract Manufacturers thrive on volume. If building something in really small volumes it is worth looking into production shops aimed at hobbyists, such ExpressPCB.

One thing to keep firmly in mind: the value of a customer to a CM is measured entirely by the volume of future business they expect. If order volumes drop off, the CM will rapidly lose interest. If your market suffers a serious downturn you may find that the quality of the product drops precipitously as the CM rushes through the build in order to get to another, more profitable customer. Similarly if you decide to terminate business with a CM and move to a competing firm, expect astonishingly bad build quality on the final order.

Distributors Fistful of Dollars

Like it or not, distributors are absolutely necessary in the embedded market. There are tens of thousands of customers buying millions of components. The component manufacturers simply cannot afford to maintain individual relationships with each customer when a distributor can represent multiple manufacturers with the same effort and expense.

Many Distributors provide FAEs (Field Application Engineers) to assist their customers in selecting components. A good FAE is extremely valuable, due to the sheer number of product designs they have been involved in. They will often be able to suggest alternatives which you might not otherwise have found, and know which parts have been problematic in other designs.

Distributors and Registration

Distributors who perform a significant amount of technical support during the design of a product cannot allow themselves to be undercut by a cheaper alternative in production. Therefore the component manufacturers allow distributors to register as being responsible for the design win at a particular customer. Only that distributor will be allowed to offer advantageous pricing, other distributors will not be allowed to undercut them.

Distributors do a lot more business with the big Contract Manufacturers than any individual customer, and CMs value their relationship with the distributor more than any individual customer. It is not unknown for a Distie to legitimately claim the design win for one particular component, then obtain the Bill of Materials from the CM and register themselves for every chip in the design. Choosing a Distributor is much like getting married: be very certain that the relationship will work in the long term before signing on the dotted line.

Contract Engineering Schematic

For a niche market with well-defined needs, a product can sell for years with minimal changes. In these cases its better to contract out the design than recruit a team. The contract will include the functional spec to be designed, timelines for completion, etc. The engineering firm will supply the requisite hardware, software, and firmware expertise, and the resulting design will belong to the customer.

Alternately, Contract Engineering firms can fill in gaps for an in-house design team. For example after components have been chosen and the design competed, a layout for the printed circuit board must be created. A good layout of a complex PCB requires an experienced designer and expensive CAD tools. It makes no sense to keep such a person on staff if only a few designs are done each year, so it is often contracted out. Mechanical design of the chassis and other sheet metal is also often done outside, for the same reason.

Contracts in this area are essentially always on a time and materials basis. The upfront estimate of total cost is not guaranteed. Fixed price contracts are exceedingly rare, because they represent an enormous risk to the engineering firm if the design time goes over estimates. In most cases this will work out fine: the engineering firm will want to get the design done quickly in order to move on to the next customer. However if business slows, watch out for unjustified padding of billable hours.

Design Verification Test Eye pattern

Not every unit coming out of the factory will be identical. Each component in the design has a tolerance, an allowed deviation which is still considered within specification. Every system coming off the line will contain parts slightly above spec or below, and it is important to insure that the complete system will still function reliably even if it contains components at the extreme ends of the tolerance. Though the hardware design takes these tolerances into account, in the real world it is difficult to anticipate every possible interaction of components and PCBs.

This is where DVT, for Design Verification test, comes in. During DVT the hardware engineers measure the system to ensure it not only does what it is supposed to do, but does so with sufficient margin to handle variances in its components. DVT is time-consuming work, and changes are often made in the production system based on what is found in the prototypes.

Compliance

The ubiquitous Underwriters Laboratories logo is likely the most widely known example of a compliance certification, but it is not the only one. There are a huge number of standards for product safety or performance, some recognized around the globe and others specific to each geographic market.

Some industries place much more stringent requirements on their products than others. For example, the NEBS Level 3 guidelines for the telecommunications market specifies that there be no noxious chemicals released if the equipment catches fire.

Conclusions

One of the aims of this blog is to provide insight into fields of software development which don't get as much exposure as Ruby on Rails or web frameworks. I've tried to provide an overview of some of the gritty details of building products. Its really quite exciting when the first prototype of a new system comes back from the factory, and you try to boot the software for the first time. When it doesn't boot, reality hits.

I'd like to thank John Walsh for much feedback and good suggestions of topics to cover in this posting.

Thursday, August 7, 2008

opensource.mycompany.com

Using GPL software imposes the requirement to redistribute the source code, but this requirement is routinely ignored in commercial products. That is a shame: even if one doesn't care about the goals of the free software movement, simple pragmatism would still favor providing the source code. Violating the GPL can cause Bad Things™ to happen, and compliance isn't very difficult. It is quite common for products to incorporate an almost unmodified busybox, glibc, and Linux kernel. Providing the source code for these cases is straightforward, and doesn't risk inadvertently giving away intellectual property.

Section 3 of version 2 of the GNU Public License concerns the responsibility to distribute source code along with a binary:

  3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:

a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

c) Accompany it with the information you received as to the offer to distribute corresponding source code.  (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)

The source code for a work means the preferred form of the work for making modifications to it.  For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to
control compilation and installation of the executable.  However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.

If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.

I am not a lawyer, though I think it might be fun to play one on TV. There is a lot of detail in the GPL about the requirements for distribution of source code, and maybe I'm dense but I don't understand what half of it means. However I would contend that if you get to the point of needing to argue over the precise definition of the terms in a legal context, you've already failed.

The problem with violating the GPL is not that you'll get sued. Of course, it is quite possible you'll be sued for violating the GPL...

... but getting sued is not the real problem. The real problem is when a posting about misappropriation of GPL software shows up on Slashdot and LWN. The real problem is when every public-facing phone number and email address for your company becomes swamped by legions of Linux fans demanding to know when you will provide the source code. The real problem persists for years after the event, when Google searches for the name of your products turn up links about GPL violations coupled with ill-informed but damaging rants.

So we want to avoid that outcome. If you read the legal complaints filed by the Software Freedom Law Center, they follow a similar pattern:

Someone discovers a product which incorporates GPL code such as busybox, but cannot find the source code on the company web site (probably because the company hasn't posted it).
This person sends a request for the source code to an address they find on that website, possibly support@mycompany.com.
This request is completely ignored or receives an unsatisfactory response.
The person contacts SFLC, who sends a letter to the legal department of the infringing company demanding compliance with the license and that steps be taken to ensure no future infringements take place.
SFLC also demands compensation for their legal expenses; thats how they fund their operation.
The corporate legal team, misreading the complaint as a shakedown attempt, stonewalls the whole thing or offers some steps but refuses to pay legal costs.
Lawsuit is filed, and the PR nightmare begins in earnest.

Keeping Bad Things From Happening

There are two points in that progression where the bad result could be averted, in steps #2 and #4. Unfortunately it is not likely you can influence either one:

In step #2 you have no idea where that initial request for the source code will go. They might send email to the sales department, or tech support. They might call the main corporate number and chat with the answering service. The request will very likely be filtered out before it makes it to someone who would realize its significance.
By the time the lawyers get involved in step #4, you're already toast. Corporations, particularly medium to large corporations, are routinely targeted to extract money for licensing intellectual property, business partnerships, or any number of reasons. The GPL claim will look like all the rest, and be treated in the same way.

This is a case where it is best to be proactive. One can't realistically wait until the first time someone requests the source code, too many things can go wrong and lead to the PR nightmare. Instead, Gentle Reader, it is best to post the GPL code somewhere that it can be found with little difficulty by someone looking for it, but otherwise draw little attention to itself.

When the GPL was created software was delivered via some physical medium (magnetic tapes, later supplanted by floppy disks, CDs, DVDs, etc). One was expected to include the source code on the same medium, or at least be willing to provide another tape containing the source. Nowadays many embedded systems are delivered with the software pre-installed and updates delivered via the Internet, so adding a CD of source code would add to the Cost of Goods Sold. Anything which adds COGS is probably a non-starter, so we'll move on.

It is certainly an option to tar up all of the GPL packages from the source tree and try to get it linked from the corporate website, likely controlled by someone in the marketing department. That conversation may not go the way you want:

"Tell me again why we need to do this?"

"We're not an opensource company, we build widgets."

"Isn't Montavista supposed to take care of this for us?"

"Our market messaging revolves around the power of our brand and the strength of our secret sauce, not opensource code. End of discussion, you commie punk."

The (hypothetical) marketing person is not being unreasonable. Ok, the last one would be unreasonable, but I thought it would be funny. Nonetheless putting GPL source code right up on the corporate website implies it is a primary focus of the corporation, when in reality it probably is just one of many tools you use in building a product. Rather than find a place on the corporate website, I advise a separate site specifically for opensource code. It needs to be something which people can easily find if they are motivated to look for it, but otherwise not draw much attention to itself. opensource.<mycompany>.com or gpl.<mycompany>.com are reasonable conventions.

Next you need a web server. Your company may already work with a web host, otherwise Google Sites is a reasonable (and free) choice. You'll need IT to set up a DNS CNAME directing opensource.<mycompany>.com to point to the new web site. If you're using Google Sites there is a Help Topic on how to do this.

The goal here is to avoid the bad result (GPL violation being posted to slashdot), not draw attention. You shouldn't spend time putting together a snazzy web site, a simple background with links to tarballs is fine. Ideally nobody will ever look at these pages.

Documentation

Lets talk about documentation. There are a number of other open source software licenses, besides the GPL. Many of them carry an "advertising clause," a requirement that "all advertising materials mentioning features or use of this software must display" an acknowledgement of the code used. The use of this clause derives from Berkeley's original license for BSD Unix, and though Berkeley has disavowed the practice there is still a great deal of open source software out there which requires it.

In practice the advertising clause results in a long appendix in the product documentation listing all of the various contributors. Honestly nobody will ever read that appendix, but nonetheless it is worth putting together. You can also include a notice that the GPL code is available for download from the following URL... so if despite your best efforts the company does get sued, you'll have something concrete to point to in defense.

Now for the hard part

The Gentle Reader may have noticed that we have not covered how to locate the GPL code used within a product. Really I'm hoping that the source tree is sufficiently organized to be able to browse the top few levels of directories and look for files named LICENSE and for copyright notices at the top of files. If it is difficult to determine whether the product contains any open source code, there is an article at Datamation which might be helpful. It discusses compliance tools, including tools which look for signatures from well-known codebases to track down more serious GPL violations.

What about the difficult case, where GPL code is being used and has been extended with proprietary code which cannot simply be posted to a website? Even if one doesn't care about the free software ethos, pragmatically this is a ticking time bomb and one that should not be ignored. I'd recommend putting up an opensource website anyway to post what you can, and working as soon as possible to disentangle the rest. Development of new features in that area of the code can be used as the lever to refactor it in a GPL-compliant way.

Update 8/2008: The Software Freedom Law Center has published a GPL compliance guide.

10/2008: Linux Hater's Redux holds up this blog post as an example of why Linux should be avoided. Okey dokey.

12/2008: Add Cisco to the list of companies sued by the SFLC over GPL issues. This time the suit was filed on behalf of the FSF for glibc, coreutils, and other core GNU components. Reactions to the news from Ars Technica and Joe Brockmeier @ ZDNET have already appeared.

5/2009: Cisco and the FSF have settled their lawsuit. Cisco will appoint a Free Software Director, make attempts to notify owners of Liksys products of their rights under the GPL, and will make a monetary contribution to the FSF.

Sunday, April 27, 2008

Four Circles of Product Management Hell

I typically work on software for hardware platforms, where the design cycle is considerably longer than that of a desktop or web application. 18 months for a new platform is not uncommon, and it is important to nail down the main feature set early enough in the process for the hardware capabilities to be designed in.

Nailing down these requirements is the job of the product manager, and at this point in my career I've worked closely with about a dozen of them. They generally approach the requirements definitions using a similar set of techniques:

visit current customers to find out what they like and dislike about the product, and what they need the product to do
visit prospective customers to get first impressions and feedback
work with the sales teams to break down barriers to a big sale (which can mean one-off feature development if the deal is big enough)
keep track of what the competition is doing
generally stay abreast of where the market is going

Over time my opinion of what constitutes "good product management" have evolved considerably. I'll give a few examples that portray what the younger, naive me thought of a particular style of product management versus what the older, cantankerous me thinks.

1. All requirements generated directly from customer requests

What I used to think:

"Wow, this is great. These are all features which obviously appeal to customers. We won't waste time on stuff that nobody will ever use, everything we develop will help sell the product."

What I now realize:

This is "Building Yesterdays Technology, Tomorrow!" Customers will ask for improvements and extensions to the existing functionality of your product, and that is fine. When a customer asks for something completely new, it is generally because they can already buy it from a competitor. In fact if the input is via an RFQ they are probably trying to steer the deal to that particular vendor by putting in a set of criteria which only that one vendor can meet. The competitor likely helped the customer put the RFQ together, as any good salescritter can supply a list of specifications which only their product can meet.

Certainly, there will always be a healthy amount of catch-up requirements in any ongoing product development. Competitors are not sitting idle, and they're probably not incompetent, so they will come up with things which your product will need to match. Any product manager will end up listing requirements which simply match what competitors already have.

However there must also be a significant amount of forward-looking development as well, building the product to meet anticipated features which the customer does not yet realize they need. Constantly playing catchup only works for companies in truly commanding (i.e. monopolistic) market positions.

2. Requirements list both externally visible behavior and internal implementation details

What I used to think:

"Umm, ok. Its a little weird that the product requirements are so detailed."

What I now realize:

The product manager believes themself to be a systems architect trapped in a product managers body. They almost certainly used to be an engineering individual contributor (many product managers come from a hardware or software engineering background). They likely feel they have such a firm grasp of the tradeoffs in developing the product that they can easily guide the engineers in how to proceed.

Also, and I am as guilty of this as anyone: engineers often have little respect for the product manager... and the reverse is true as well. The product manager may be trying to provide a top-level design for the product because they consider the engineering team to be incapable of doing it correctly.

3. Anticipates market needs with uncanny and unfailing accuracy, down to the smallest detail

What I used to think:

"Wow, this Product Manager really knows their stuff. This is great!"

What I now realize:

The company is doomed, because we're in a market which is too easy. We'll have five hundred competitors, who will all build exactly the same product and end up racing to the bottom of the price curve so nobody will make any money. Polish the resume.

The Gentle Reader might ask, "What about the product manager who can analyze a difficult and complex market to generate the perfect product requirements with 100% accuracy?" The answer is quite simple really: such a person does not exist. We developers very much want the paragon of product management to exist, someone who can predict the unpredictable and know the unknowable, in order to satisfy some deep-seated need in our world view. However there is no such thing as omniscience, the best product manager is one who can generate a market analysis which is good enough to get some initial traction and be refined in subsequent updates.

4. Product manager is often not in the office, and only revs the requirements document sporadically

What I used to think:

"Slacker."

What I now realize:

Actually, these are some of the signs of a really good product manager. When kicking off a new product they will spend a lot of time visiting customers and potential customers to gather inputs.

They apply a certain amount of skepticism, filtering and summarization to the raw inputs coming from outside the development environment, but without introducing too much delay. A product manager who passes on new inputs every time they meet with a customer or return from a trade show is not a good product manager, they're just a note taker. A product manager who suffers from analysis paralysis, not wanting to make a call until the correct course of action is absolutely clear, is also not a good product manager. There is a finite window of market opportunity, miss it and you'll find out from your competitors what the right thing to do was.

The best product managers know not to change the requirements too often, or the development team will never finish anything. So they will not be constantly releasing new versions of the requirements spec, and any changes which are made will have gotten enough validation to be credible without taking so long in validation as to be stale.

Unfortunately it is also possible that the reason they are not in the office and don't produce much is because they really are a slacker. Caveat emptor.

Afterthoughts

In fairness, I should acknowledge that the product manager job entails more than just defining the product. As they fit in a role equidistant between engineering, marketing and sales, the product manager is frequently pulled in to handle crises in any of the three areas.

as an expert user of the product, they do a lot of sales demos
they frequently end up training resellers (and customers) on the use of the product
they invariably get sent to all of the boring trade shows
they assist in preparing white papers, sales guides for particular vertical markets, etc

Finally, for a perspective from the other side of the fence I highly recommend The Cranky Product Manager. Entertaining, with a dash of truthiness.