Coding Relic: March 2011

Wednesday, March 30, 2011

Non-blocking Programmers

It is often stated that the productivity of individual programmers varies by an order of magnitude, and there is significant research supporting the 10x claim. More subjectively, I suspect every working developer quickly realizes that the productivity of their peers varies tremendously. Having been at this for a while, I suspect there is no single factor or even small number of factors which cause this variance. Instead there are a whole collection of practices, all of which which add up to determine an individual developer's productivity.

To make things more interesting, many of the practices conflict in some way. We'll discuss three of them today.

1. Non-Blocking Operation

We don't just write code right up until the moment the product ships. Numerous steps depend on other people: code reviews, dependent modules or APIs, a test cycle, etc. When faced with a delay to wait for someone else, a developer can choose several possible responses.

blocking: while waiting for the response do something other than produce code for the project. Codelabs, reading related documentation, and browsing the programming reddit are all examples.

non-blocking: switch to a different coding task in another workspace.

Versatility and wide ranging knowledge is a definite positive (see point 2), and people who spend time satisfying intellectual curiosity grow into better developers. The blocking developer spends time pursuing those interests. We'll ignore the less positive variations on this.

The non-blocking programmer makes progress on a different development task. This can of course be taken too far: having a dozen workspaces and context switching to every one of them each day isn't productivity, its ADHD.

One could also label these as single-tasking versus multi-tasking, but that analogy implies more than I intend.

Sometimes developers maximize their own productivity by immediately interrupting the person they are waiting for, generally with the lead-in of "I just sent you email." This impacts point 3, the amount of time developers can spend in a productive zone, and is one of the conflicts between practices which impact overall productivity.

2. Versatile Techniques

Here I'm obliged to make reference to a craftsman's toolbox, with hammers and nails and planers and other woodworking tools I haven't the slightest idea what to do with. The essential point is valid without understanding the specifics of carpentry: a developer with wide ranging expertise can bring more creative solutions to bear on a problem. For example,

Realizing that complex inputs would be better handled by a parser than an increasingly creaky collection of string processing and regex.
Recognizing that a collection of data would be better represented as a graph, or processed using a declarative language
Recall having read about just the right library to solve a specific problem.

Developers with a curiosity about their craft grow into better developers. This takes time away from the immediate work of pounding out code (point 1), but makes one more effective over the long run.

3. Typing Speed

This sounds too trivial to list, but the ability to type properly does make a difference. Steve Yegge dedicated an entire post to the topic. I concur with his assessment that the ability to touch type matters, far more than most developers think it should. I'll pay further homage to Yegge with a really long explanation as to why.

Filthy old keyboard Developers work N hours per day, where N varies considerably, but the entire time is not spent writing code. We have interruptions, from meetings to questions from colleagues to physical necessities. The amount of time spent actually developing can be a small slice of one's day. More pressingly, we don't just sit down and immediately start pounding out program statements. There is a warm up period, to recall to mind the details of what is being worked on. Reviewing notes, re-reading code produced in the previous session, and so forth get one back into the mode of being productive. Interruptions which disrupt this productive mode have far greater impact than the few minutes it takes to answer the question.

Peopleware, the classic book on the productivity of programmers, refers to this focussed state as "flow" and devotes sections of the book to suggestions on how to maximize it. As the book was published in 1987, some of the suggestions now seem quaint like installing voice mail and allowing developers to turn off the telephone ringer. The essential point remains though: a block of time is far more useful than the same amount of time broken up by interruptions, and developers do well to maximize these blocks of time.

Once in the zone, thoughts race ahead to the next several steps in what needs to be done. Ability to type quickly and accurately maximizes the effectiveness of time spent in the flow of programming. Hunting and pecking means you only capture a fraction of what could have been done.

There are other factors relating to flow which can be optimized. For example one can block off chunks of time, or work at odd hours when interruptions are minimal. Yet control of the calendar isn't entirely up to the individual, while learning to type most definitely is.

Conclusion

The most effective, productive programmer I know talks very fast and types even faster. He has worked in a number of different problem spaces in his career, and stays current by reading Communications of the ACM and other publications. He handles interruptions well, getting back into the flow of programming very quickly. He also swears profusely, though I suspect that isn't really a productivity factor.

Other highly effective programmers have different habits. The most important thing is to be aware of how to maximize your own effectiveness, rather than look for a single solution or adopt someone else's techniques wholesale. Especially not the swearing.

Monday, March 28, 2011

Computer Science Terminology

Priority Inversion	When a source tree closes early for low priority bugs, resulting in developers working on those first and putting the critical stuff off for later.
Catastrophic subtraction	When code to optimize a "bizarre corner case that never really happens" is removed.
Fire Fighting	Desperately working to fix bugs to avoid being fired.
Infinite loop	Inevitable result of programming languages which use an iterator variable but require the iterator be incremented explicitly.
lightweight, simple	attributes applied to one's preferred solution. For solutions preferred by others, see heavyweight, complex

Tuesday, March 22, 2011

Ada Lovelace Day: October 7, 2011

Ada Lovelace Day is an international day of celebration of the achievements of women in science, technology, engineering and maths. Last year it was in late March, but it is moving to October 7, 2011.

For Ada Lovelace Day in 2010 I wrote an article describing the guided torpedo patent issued to Hedy Lamarr and George Antheil. The research for that article was quite interesting, and I plan to do a similar writeup this year.

Thursday, March 17, 2011

Random Early Mea Culpa

Long, long ago I was an ASIC designer. I worked mostly on devices for ATM networks. Try not to judge too harshly, I was young and back then people said ATM was a good idea.

In the early 1990s there was a new concept for how to manage congestion in an IP network: Random Early Discard. Its basic premise is that TCP detects congestion via packet loss. If you wait until the switch buffers are completely full, you end up dropping a bunch of packets before TCP can respond. With RED the switch begins deliberately dropping packets before the queues are completely full, providing an early indication of a problem and triggering TCP to slow down more gracefully.

As described in the paper proposing it, the hardware should choose a random packet already within its queue to drop. As ASIC designers, that seemed ludicrous.

We'd already stored that packet, and found resources to hold it. Why spend all those resources and then just throw it away?
Hardware at that time often used FIFOs. We couldn't drop a packet and immediately reclaim its buffering. We could only drop it when it finally exited the FIFO, some time in the future. Madness!

Graph of drop probabilities from 0.0 to 1.0 as a function of queue depth 0% to 100%

So instead I came up with a drop probability at ingress. As the queue depth increased, the ASIC would begin dropping packets with increasing probability. The external behavior would match the requirements by dropping packets as the queue filled, thought I. It would also better align with the properties of a FIFO, thought I.

Unfortunately this only superficially matched the desired behavior, in that it did drop packets before the queues became completely full. It took several years to fully understand how badly I'd misunderstood the idea.

Propagation Delay

The first issue is with the amount of time for the indication of a problem to reach the entity which could do something about it. The sending TCP will realize there is a problem when it times out on receiving an ACK. Dropping a packet at ingress to the FIFO delays the indication to the sender. Had I dropped a packet somewhere in the queue, its timer would be further along and the indication of a problem would come sooner.

Burstiness

However, this wasn't the biggest mistake. A more serious problem was which packet would be dropped. TCP flows tend to be bursty: a host gets a chunk of data to send, and it sends as much as its current transmission window allows. When congestion occurs in a switch it is usually not because the overall level of traffic on the network has increased, its most often because a small number of flows are sending large bursts at the same time. To ameliorate it, you need to slow down those particular flows.

ASIC buffers are designed with bursty behavior in mind. Estimating the burst size is straightforward: you can guess at the round trip time based on whether its a LAN or WAN, and you know the bit rate. The ASIC queues are sized to ensure they can absorb one or more bursts, with some extra padding for safety.

Unfortunately this means that as the buffer fills, it is all but guaranteed to have absorbed the burst(s) which caused the congestion. The packets which arrive later are innocent, and are not occupying the majority of queue space. In the illustration above, the red flow clearly occupies most of the queue but has finished its burst. Had packets been dropped from within the queue, the offending flow would have suffered proportionally. By dropping packets only at ingress, the flows which suffer are those which haven't yet finished their bursts. It will almost certainly punish the wrong flow. It blames the victims of congestion, not the perpetrators.

Bufferbloat

Yet even this wasn't the biggest mistake. At this point I have to include the entire networking industry, not just me personally.

Our biggest mistake was in making queue management optional, and making it scary.

Instead of describing RED as a feature to control congestion in the network, we described it as a feature which would deliberately drop your packets. I attribute this to the same attitude which made ASIC designers want to hold onto the packets which had already been stored in the buffers. We made RED sound like a dangerous thing, which you should only use if you know exactly what you're doing and also have some very special network with obscure requirements.

The result is that it is widespread practice to leave all forms of active queue management turned off, considering it risky and unnecessary. There have been some efforts to rectify this portrayal. We now define RED as Random Early Detection, to avoid using the word "discard." The industry also now offers Explicit Congestion Notification, which marks packets rather than dropping them. Nonetheless even ECN isn't widely used.

Instead of pushing queue management, the networking industry has relied on Moore's law to vastly increase the amount of buffering in switches. There is equipment with so much buffering that it is no longer described in terms of packets or bytes, but in how many seconds of traffic it can absorb. There are reports of packets on subscriber networks being delayed a full 8 seconds before being successfully delivered. We have avoided the need for queue management by never allowing the queues to fill.

This is congestion control via infinite buffering. Unfortunately there are two, related problems with it:

It isn't really infinite.
It is addictive.

There is now so much buffering in the network that TCP's own attempts at congestion control are undermined. By the time TCP realizes there is a problem, there is a vast amount of data sitting in queues. Even if TCP reacts immediately and forcefully, it won't have an impact until the mass of packets already in the network sort themselves out. We've created a feedback loop where the control delay is enormous. Most of the time it works, but when it doesn't work the results are astonishingly bad.

It is also addictive, and the patient develops a tolerance. The solution is always more buffering, to kick the can even further down the road. As traffic grows the need for doses of buffering becomes ever larger.

As an industry, we have some work to do.

Wednesday, March 16, 2011

RIP www.sun.com

www.sun.com will be retired on June 1, 2011. Some of the content will move to Oracle sites. The rest will be discarded.

My recollection is that the Sun web site started in about 1994. When the Wayback Machine started in 1996, www.sun.com had been taken over by Sun corporate. Before that the earliest pages were put together by engineers. It was all very basic stuff, just text and images. Even the <TABLE> tag did not exist at that point.

Tuesday, March 15, 2011

Program Trading Overdrive

Several decades ago Wall Street began to experiment with automated trading to respond to market moves more quickly than a human could react. They wrote programs to take a data feed in over the network, and output trading orders. The programs were viewed skeptically at first, but produced excellent returns.

As time went by they switched these programs from TCP to UDP. There was no value in retransmitting lost packets: it was too late to act on it anyway. TCP added too much latency. The UDP program traders reacted more quickly, and produced better returns than TCP.

Later they switched these programs over to a raw socket, reading in Ethernet frames and implementing any needed protocols in user space. The kernel protocol stack added too much latency, with its queuing disciplines and general bloat. The raw socket program traders reacted more quickly, and produced better returns than UDP.

Then they started running the programs within the NIC firmware, with help from the vendor. Interrupts added too much latency. The NIC firmware reacted more quickly, and produced better returns.

Now Wall Street is implementing the highest speed trading in FPGAs, pre-loaded with rules of what to match in incoming packets and what to do if found. Software adds too much latency.

Its a brave new world.

Update as this came up almost immediately: I suspect the Flash Crash of 2010 is only the beginning, but I believe we crossed that particular line long ago. I don't think the differences between 1 msec -> 10 usecs -> 100 nsecs are making us vulnerabile to out-of-control feedback loops in the stock market. We're already vulnerable, and have been for years.

Thursday, March 10, 2011

Experiential versus Informational Search

Yesterday Louis Gray posed a set of questions which seem like they should be easy to answer, but aren't. While reading it one of the questions stood out, for obvious reasons.

"2. When was the first time Denton Gentry left a comment on my blog?"

Louis had sent that question via email earlier in the day, and it turned out to be very difficult to answer. My profile on disqus.com shows comments going back to July 2009. That should be definitive, but unfortunately isn't correct as manual checking had already turned up earlier comments. In the end Louis answered his own question by searching his email for Disqus notifications. It was a full year before the first comment shown on my profile page. The other questions were similar: find the first citation. From a technical perspective, it should be easy to answer questions like this as all of the information is available. That it isn't easy is a reflection of economic reality. There is infrequent demand for it.

I'd like to flip it around, though: why is email able to answer questions like this? You can search email and sort it by date. You can find emails around a particular time. You can find emails which happened at about the same time as some other event which is unrelated, but intertwined in your memories. Why is email structured this way?

I suspect this is a reflection of human psychology. Email is information which we personally experienced. It exists in our own memories, albeit dimly or imperfectly. When we go to search for it, we're searching as an extension of our own memory. Its Experiential search, not Informational, and email services which don't match our expectations in this regard get less traction. This is also what makes services like Evernote so useful, letting us organize and search arbitrary information Experientially.

In comparison when searching for something we never personally experienced we're looking for information which we know must exist, and we just need to find it. Search engines are designed to this expectation.

The disconnect occurs when we want an Experiential search over an Informational dataset. Organizing arbitrary information in a way which maps to what we'd expect had we personally experienced it is an unsolved problem. It has been a rich field of speculation in science fiction, as authors have postulated implanted memories and neural interface.

Will there be developments in this area? Clearly there is at least some demand, as LexisNexis can answer such queries for the subset of publications they handle. Its something we'll need to work on if we're going to make the world even more like science fiction.

Monday, March 7, 2011

Content-Type: joke; genre=bar/packet

An IPv6 packet wants to walk into a bar, but can't cross the street to get there.
An STP packet closes the bar so the bar next door can continue to operate.
A VRRP packet gets confused about which bar it is in. Then it sets both bars on fire. Nobody knows why.
A BGP packet gives you a list of every bar in the world, everywhere. China crosses 25% of them off the list.
A multicast packet bar hops.
A forlorn BootP packet sits in a corner crying into its beer. Nearby, a DHCP packet is the center of attention.
An SCTP packet is stopped at the doorway and sent away.
An anycast packet doesn't really care which bar its in.
The LLC/SNAP packet has such a funny accent the bartender doesn't understand it.
An LACP packet comes in through the window, just checking that all of the paths work.
A large group of ARP packets storm in and make a general nuisance of themselves.
The bar hasn't been allowed to segregate customers by VLAN since the 1960s.
An IP fragment walks into the bar. The bartender makes it wait to be seated until the rest of its party arrives.
A kerberos packet walks into the bar, but is confused by the clock being 6 minutes fast. The NTP packet hands it a drink, and all is well.
An SFlow packet gossips with the bartender about all of the other packets.
The rest of the packets look pityingly at the old IPX packet.
The strict source routed packet tries to get to the bar but ends up at the dry cleaners instead.
The GRE packet carries another packet into the bar, sets it on the barstool, orders it a drink, and goes back to help another.
The TTL=1 packet barely makes it to the bar.
The IP Options packet arrives late. It had to take the slowpath.

I suspect many of these will only be funny to about 3 people in the world. If you are not one of them, my deepest apologies.

Thursday, March 3, 2011

Home Made Cable Spaghetti

Rack of equipment entangled in a messy mass of cables

I wrote some thoughts for a colleague about home installation of a rack for computer equipment. Much of it is generally applicable for anyone considering such a thing, presented here for your edification and bemusement.

General

buy extra rack screws, maybe 20. The cheap ones strip easily, and nothing sucks harder than getting a new system in only to discover you've run out of rack screws.
Get an electric screwdriver if you don't already have one.

Physical Installation

Bolt the rack to the floor and the ceiling. Otherwise an earthquake which doesn't otherwise damage the house can rip the rack out of the floor. The rack itself will be shorter than ceiling height, you get an extended brace to bolt to the ceiling. You want to bolt it to joists, not just drywall.
There are also racks made to bolt to the wall rather than free standing floor to ceiling. These tend not to be as deep front to back, so it impacts the gear you can put in it. Also they can be an airflow problem if the gear vents front to back.
Perhaps obviously, when filling the rack start from the bottom and put the heaviest gear at the very bottom. A top-heavy rack is a disaster.
The industry never settled on whether airflow is front to back or side-to-side. You'll find equipment with both layouts. With one rack it doesn't particularly matter, and you can mix them. With multiple racks it matters a lot.
The industry also never settled on whether rack ears go at the very front of the equipment or the midpoint. Front is most common, to accommodate boxes of differing depths. Lots of gear has threaded screwholes at front and midpoint, just be consistent.
19 inch racks are by far the most common, but be aware that 17 inch and 23 inch both exist. They will be well-labelled in catalogs as they are not common. If you buy second hand, bring a tape measure.

Cable Management

Spend as much time thinking about cable management as you do about how to rack the machines. Otherwise you end up with a beautiful rack covered in cable spaghetti.
Its customary to put the network switch at the top of the rack, because gravity makes cable management easier. However its not essential, and you can put it anywhere you like.
There are cable trays with removable fronts made to bolt vertically to the side of the rack or between adjacent racks. HIghly recommended.
Label both ends of each cable. Label them in a way which will still make sense in a few years when you replace these machines and have forgotten everything about the construction.
Avoid labeling cables according to their destination within the rack. That changes over time, relabeling cables is a pain.