Wednesday, April 20, 2011

A Tale of Two MACs

If you've looked at the spec sheets for 10 Gig server NICs, you may have noticed something interesting: the feature set supported when operating at 10 Gig is often not the same as the feature set for 10/100/1000 Mbps. Usually, the 10G features are a subset of the lower speed options.

Modern NIC designs essentially always contain a CPU helping out with datapath operation, and sometimes this feature disparity is due to an inability to keep up with processing at the higher link rate. However, that isn't the entire story.

The Care and Feeding of Half Duplex

Lets discuss what goes in to an Ethernet NIC. The block diagram shown here isn't comprehensive, its intended to highlight only those aspects to be discussed further. We start with a DMA engine, plus buffering for sent and received packets. The MAC design is typically split into TX and RX modules for chip layout reasons. Control signals run between RX and TX to support flow control, where a received pause frame will make the transmitter cease sending packets. As Ethernet pause is frame by frame, the timing for this control signal is fairly relaxed. NIC ASICs also generally integrate the PHY to reduce cost, but 10G copper PHYs are new enough that this is not yet always done.

Ethernet NIC showing MAC, packet buffering, and DMA

You'll note that the TX and RX MACs are further subdivided, with a red line running from the middle of RX to the middle of TX. This is used for half duplex operation. While transmitting half duplex, the MAC compares what it sees on the wire to what it is transmitting. When the received bits don't match the sent, it means another station is transmitting at the same time and they have collided. Both MACs cease transmitting and back off.

Further, there are two switches in the middle of the red line. While the station is transmitting with the received signal fed to the TX MAC, it is important that the RX MAC not process the data. It isn't a packet: the rx counters should not be incremented and the payload should not be handed to the software as a received frame. The RX MAC is disconnected until the transmission finishes, then resumes listening for packets.

Support for the Gigabit half duplex comes with additional complexity. For reasons which would take too long to describe here, half duplex at gigabit speed requires the MAC to implement frame bursting. The MAC transmits multiple frames without dropping the carrier, to ensure that collisions can be detected. Though this isn't terribly difficult, it is yet another bit of complexity which has to be tolerated to support a feature which hardly anyone actually uses.

A Tale of Two MACs

Half duplex was the only option for Ethernet networks until just sightly before 100 Mbps Ethernet debuted. For the most part the transition to switched networks running full duplex happened during the 100 Mbps era. By the time Gigabit Ethernet debuted, full duplex operation was the norm with half duplex used by an ever diminishing sliver of the market. Though Gigabit Ethernet defines a half duplex mode, it is rarely used and a number of early gigabit products didn't work properly in half duplex mode.

Ethernet NIC showing two MACs, one for 10G and one for 10/100/100010 Gig Ethernet does not have a half duplex mode. It always operates full duplex.

It is difficult to implement a MAC hardware design which handles the full range of link speeds from 10 Mbps all the way up to 10 Gbps, three orders of magnitude faster. Add in a requirement to run a time-critical signal all the way across the chip and between TX/RX clock domains, plus gigabit frame bursting, and it becomes even harder.

Therefore some hardware designs punt, and include what are recognizably two MACs. One is used for 10/100/1000 operation, supports half duplex operation, and is most likely derived from an existing design from older products. The 10G MAC is new, only supports full duplex, and has wider datapaths needed for higher speed operation. It only supports features useful for server deployments, because at this point 10G is too expensive for desktop or other uses. The chip chooses between the two MACs based on the link speed, the result of autonegotiation or explicit configuration.

The feature set is different for 10G operation because internally it really is different. Nonetheless, it operates as just one interface. The two MACs might be visible to the driver software, but not above that. To the rest of the software stack, its just one NIC.