Friday, August 26, 2011

QFabric Conclusion

This is the fourth and final article in a series exploring the Juniper QFabric. Earlier articles provided an overview, a discussion of link speed, and musings on flow control. Juniper says the QFabric should not be thought of as a network but as one large distributed switch. This series examines techniques used in modular switch designs, and tries to apply them to the QFabric. This article attempts to cover a few loose ends, and wraps up the series.

As with previous days, the flow control post sparked an interesting discussion on Google+.

Whither Protocols?

Director connected to edge node and peered with another switchFor a number of years switch and router manufacturers competed on protocol support, implementing various extensions to OSPF/BGP/SpanningTree/etc in their software. QFabric is almost completely silent about protocols. In part this is a marketing philosophy: positioning the QFabric as a distributed switch instead of a network means that the protocols running within the fabric are an implementation detail, not something to talk about. I don't know what protocols are run between the nodes of the QFabric, but I'm sure its not Spanning Tree and OSPF.

Yet QFabric will need to connect to other network elements at its edge, where the datacenter connects to the outside world. Presumably the routing protocols it needs are implemented in the QF/Director and piped over to whichever switch ports connect to the rest of the network. If there are multiple peering points, they need to communicate with the same entity and a common routing information base.

Flooding Frowned Upon

The edge Nodes have an L2 table holding 96K MAC addresses. This reinforces the notion that switching decisions are made at the ingress edge, every Node can know how to reach destination MAC addresses at every port. There are a few options for distributing MAC address information to all of the nodes, but I suspect that flooding unknown addresses to all ports is not the preferred mechanism. If flooding is allowed at all, it would be carefully controlled.

Much of modern datacenter design revolves around virtualization. The VMWare vCenter (or equivalent) is a single, authoritative source of topology information for virtual servers. By hooking to the VM management system, the QFabric Director could know the expected port and VLAN for each server MAC address. The Node L2 tables could be pre-populated accordingly.

By hooking to the VM management console QFabric could also coordinate VLANs, flow control settings, and other network settings with the virtual switches running in software.

NetOps Force Multiplier

Where previously network engineers would be configuring dozens of switches, QFabric now proposes to manage a single distributed switch. Done well, this should be a substantial time saver. There will of course be cases where the abstraction leaks and the individual Nodes have to be dealt with. The failure modes in a distributed switch are simply different. Its unlikely that a single line card within a chassis will unexpectedly lose power, but its almost certain that Nodes occasionally will. Nonetheless, the cost to operate QFabric seems promising.


QFabric is an impressive piece of work, clearly the result of several years effort. Though the Interconnects use merchant silicon, Juniper almost certainly started working with the manufacturer at the start of the project to ensure the chip would meet their needs.

The most interesting part of QFabric is its flow control mechanism, for which Juniper has made some pretty stunning claims. A flow control mechanism with fairness, no packet loss, and quick reaction to changes over such a large topology is an impressive feat.

footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Ethernet label.