This is the second in a series of posts exploring the Juniper QFabric. Juniper says the QFabric should not be thought of as a network but as one large distributed switch. This series examines techniques used in modular switch designs, and tries to apply them to the QFabric. This article focuses on link speeds.
How Fast is Fast?
Ethernet link speeds are rigidly specified by the IEEE 802.3 standards, for example at 1/10/40/100/etc Gbps. It is important that Ethernet implementations from different vendors be able to interoperate.
Backplane links face fewer constraints, as there is little requirement for interoperability between implementations. Even if one wanted to it isn't possible to plug a card from one vendor into another vendor's chassis, they simply don't fit. Therefore backplane links within a chassis have been free to tweak their links speeds for better performance. In the 10 Gbps generation of products backplane links have typically run at 12.5 Gbps. In the 40 Gbps Ethernet generation I'd expect 45 or 50 Gbps backplane links (I don't really know, I no longer work in that space). A well-designed SERDES for a particular speed will have a bit of headroom, enabling faster operation over high quality links like a backplane. Running them faster tends to be possible without heroic effort.
A common spec for switch silicon in the 1Gbps generation is 48 x 1 Gbps ports plus 4 x 10 Gbps. Depending on the product requirements, 10 Gbps ports can be used for server attachment or as uplinks to build into a larger switch. At first glance the chassis application appears to be somewhat oversubscribed, with 48 Gbs of downlink but only 40 Gbps of uplink. In reality, when used in a chassis the uplink ports will run at 12.5 Gbps to get 50 Gbps of uplink bandwidth.
Though improved throughput and capacity is a big reason for running the backplane links faster, there is a more subtle benefit as well. Ethernet links are specified to run at some nominal clock frequency, modulo a tolerance measured in parts per million. The crystals driving the links can be slightly on the high or low side of the desired frequency yet still be within spec. If the backplane links happen to run slightly below nominal while the front panel links are slightly higher, the result would be occasional packet loss simply because the backplane cannot keep up. When a customer notices packet loss during stress tests in their lab it is very, very difficult to convince them it is expected and acceptable. Running the backplane links at a faster rate avoids the problem entirely, the backplane can always absorb line rate traffic from a front panel port.
This is one area where QFabric takes a different tack from a modular chassis: I doubt the links between QF/Node and Interconnect are overclocked even slightly. In QFabric those links are not board traces they are lasers, and lasers are more picky about their data rate. In the Packet Pushers podcast the Juniper folks stated that the uplinks are required to be faster than the downlinks. That is, if the edge connections are 10 Gbps then the uplinks from Node to Interconnect have to be 40 Gbps. One reason for this is to avoid overruns due to clock frequency variance, and ensure the Interconnect can always absorb the link rate from an edge port without resorting to flow control.
Next article: flow control.
footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Ethernet label.