Monday, October 24, 2011

Well Trodden Technology Paths

Modern CPUs are astonishingly complex, with huge numbers of caches, lookaside buffers, and other features to optimize performance. Hardware reset is generally insufficient to initialize these features for use: reset leaves them in a known state, but not a useful one. Extensive software initialization is required to get all of the subsystems working.

Its quite difficult to design such a complex CPU to handle its own initialization out of reset. The hardware verification wants to focus on the "normal" mode of operation, with all hardware functions active, but handling the boot case requires that a vast number of partially initialized CPU configurations also be verified. Any glitch in these partially initialized states results in a CPU which cannot boot, and is therefore useless.

Large CPU with many cores, and a small 68k CPU in the corner.

Many, and I'd hazard to guess most, complex CPU designs reduce their verification cost and design risk by relying on a far simpler CPU buried within the system to handle the earliest stages of initialization. For example, the Montalvo x86 CPU design contained a small 68000 core to handle many tasks of getting it running. The 68k was an existing, well proven logic design requiring minimal verification by the ASIC team. That small CPU went through the steps to initialize all the various hardware units of the larger CPUs around it before releasing them from reset. It ran an image fetched from flash which could be updated with new code as needed.


Warning: Sudden Segue Ahead

Networking today is at the cusp of a transition, one which other parts of the computing market have already undergone. We're quickly shifting from fixed-function switch fabrics to software defined networks. This shift bears remarkable similarities to the graphics industry shifting from fixed 3D pipelines to GPUs, and of CPUs shedding their coprocessors to focus on delivering more general purpose computing power.

Networking will also face some of the same issues as modern CPUs, where the optimal design for performance in normal operation is not suitable for handling its own control and maintenance. Last week's ruminations about L2 learning are one example: though we can make a case for software provisioning of MAC addresses, the result is a network which doesn't handle topology changes without software stepping in to reprovision.

The control network cannot, itself, seize up whenever there is a problem. The control network has to be robust in handling link failures and topology changes, to allow software to reconfigure the rest of the data-carrying network. This could mean an out of band control network, hooked to the ubiquitous Management ports on enterprise and datacenter switches. It might also be that a VLAN used for control operates in a very different fashion than one used for data, learning L2 addresses dynamically and using MSTP to handle link failures.

All in all, its an exciting time to be in networking.