Coding Relic: WWMD?

When Apple announced its A4 ARM CPU, I speculated about what would be in it. This speculation turned out to be completely wrong, but it was fun to write and engendered some good conversations about the possibilities. Now that Microsoft has signed an architecture license for ARM, I'm going to do it again. This article is complete speculation, and therefore rubbish. I am reliant on the same public sources of information as everyone else. Here we go.

What Will Microsoft Do?

Many companies license core designs from ARM, building them into chips with peripherals to add functionality. The various levels of ARM license offer both synthesized gate-level netlists and encrypted synthesizable RTL. An architectural license is much more extensive, allowing development of entirely new implementations of the ARM instruction set. Only the architectural license conveys the unencrypted, modifiable source code for the processor design. According to news reports, three other companies currently have an architectural license: Qualcomm, Marvell Semiconductor and Infineon Technologies. Qualcomm develops the Snapdragon, a line of ARM CPUs with integrated DSP and various mobile-related features. Marvell now owns the license originally used by DEC for development of the StrongARM, and under which Intel later produced the XScale. Infineon entered into the licensing agreement in late 2009, and will focus on security applications. Infineon is one of the few companies which embeds DRAM cells into the same die as the logic, and presumably will use this capability for HD SIM cards and other applications.

More notable than the list of architecture licensees is the list of companies which do not license the architecture. Samsung, which produces vast numbers of ARM CPUs, is content to work with ARM cores. Apple uses a Cortex A8 core in the A4, and does not have an architectural license either. Neither do TI, Cirrus Logic, or Atmel. Most companies use ARM core designs, and spend their efforts on the logic surrounding the CPU.

I tend to agree with The Register's take on it: Microsoft's ARM license is all about the XBox. Windows Mobile phones and the Zune use ARM, but there is little justification in producing their own chip for these markets. The XBox is the one hardware product which Microsoft produces itself, which can gain a competitive advantage via a unique CPU design, and which sells in large enough volume to be worth it. Microsoft already employs a great deal of custom silicon from suppliers in the product, such as the XCGPU used in the XBox 360-S. This chip combines the main PowerPC CPU and the ATI GPU onto a single die, with a second DRAM die incorporated into the package.

Its the Architecture, Stupid

So what might Microsoft do? I'll speculate that they won't design their own entirely new pipeline, the return on investment seems slim compared to other things they could spend time on. Its more likely they'd start from an existing ARM core and begin making changes. Microsoft will certainly integrate a powerful GPU onto the processor die, not doing so would be a step backwards from the existing XBox 360-S. I'll speculate they will tightly couple the GPU, allowing very low latency access to it as an ARM coprocessor in addition to the more straightforward memory mapped device. This is not unique: some of the on-chip XScale functional units can be accessed both as coprocessors for low latency and as memory mapped registers to get to the complete functionality of the unit. Having very low latency access to the GPU would allow efficient offloading of even small chunks of processing to GPU threads.

Yet even ARM coprocessors can be designed without needing an architectural license. TI implements its DaVinci DSP as a coprocessor, and Cirrus Logic had its own Maverick Crunch FPU. Neither company is an architectural licensee. So why would Microsoft feel it needs one?

One possibility is to let the GPU directly access the ARM processor cache and registers. This would allow GPU offloading to work almost exactly like a function call, putting arguments into registers or onto the stack with a coprocessor instruction to dispatch the GPU. When the GPU finishes, the ARM returns from the function call. For operations where the GPU is dramatically better suited, the ARM CPU would spend less time stalled than it would take to compute the result itself. If the ARM CPU supported hardware threads, it could switch to a different register file and run some other task while the GPU is crunching.

Part of the success of the XBox is due to its straightforward programming model compared to the Sony PS3. XBox has a fast SMP CPU paired with a GPU, where PS3 has an unruly gaggle of Cell processors to be managed explicitly. XBox cannot rely on the individual cores getting faster, as single core performance has leveled off due to power dissipation constraints. XBox has to make it easy for game developers to take advantage of more cores. Tightly coupling the GPU threads so they can function more like one big SMP system is one avenue to do this.

Wrapping up

I'll say it again: I made this all up. I have no insight into the specifics of Microsoft's intentions, just speculation. In the unlikely event that anyone reads this, don't copy it into Wikipedia as though it were verified information.

Saturday, July 24, 2010

WWMD?