Friday, October 7, 2011

Finding Ada, 2011

Ada Lovelace Day aims to raise the profile of women in science, technology, engineering and maths by encouraging people around the world to talk about the women whose work they admire. This international day of celebration helps people learn about the achievements of women in STEM, inspiring others and creating new role models for young and old alike.

For Ada Lovelace Day 2010 I analyzed a patent for a frequency hopping control system for guided torpedoes, granted to Hedy Lamarr and George Antheil. For Ada Lovelace Day this year I want to share a story from early in my career.

After graduation I worked on ASICs for a few years, mostly on Asynchronous Transfer Mode NICs for Sun workstations. In the 1990s Sun made large investments in ATM: designed its own Segmentation and Reassembly ASICs, wrote a q.2931 signaling stack, adapted NetSNMP as an ILMI stack, wrote Lan Emulation and MPOA implementations, etc.

Yet ATM wasn't a great fit for carrying data traffic. Its overhead for cell headers was very high, it had an unnatural fondness for Sonet as its physical layer, and it required a signaling protocol far more complex than the simple ARP protocol of Ethernet.

Cell loss == packet loss.Its most pernicious problem for data networking was in dealing with congestion. There was no mechanism for flow control, because ATM evolved out of a circuit switched world with predictable traffic patterns. Congestive problems come when you try to switch packets and deal with bursty traffic. In an ATM network the loss of a single cell would render the entire packet unusable, but the network would be further congested carrying the remaining cells of that packet's corpse.

Allyn Romanow at Sun Microsystems and Sally Floyd from the Lawrence Berkeley Labs conducted a series of simulations, ultimately resulting in a paper on how to deal with congestion. If a cell had to be dropped, drop the rest of the cells in that packet. Furthermore, deliberately dropping packets early as buffering approached capacity was even better, and brought ATM links up to the same efficiency for TCP transport as native packet links. Allyn was very generous with her time in explaining the issues and how to solve them, both in ATM congestion control and in a number of other aspects of making a network stable.

ATM also had a very complex signaling stack for setting up connections, so complex that many ATM deployments simply gave up and permanently configured circuits everywhere they needed to go. PVCs only work up to a point, the network size is constrained by the number of available circuits. Renee Danson Sommerfeld took on the task of writing a q.2931 signaling stack for Solaris, requiring painstaking care with specifications and interoperability testing. Sun's ATM products were never reliant on PVCs to operate, they could set up switched circuits on demand and close them when no longer needed.

In this industry we tend to celebrate engineers who spend massive effort putting out fires. What I learned from Allyn, Sally, and Renee is that the truly great engineers see the fire coming, and keep it from spreading in the first place.

Update: Dan McDonald worked at Sun in the same timeframe, and posted his own recollections of working with Allyn, Sally, and Renee. As he put it on Google+, "Good choices for people, poor choice for technology." (i.e. ATM Considered Harmful).