Coding Relic: May 2011

Monday, May 30, 2011

Lamenting the Lack of Love for Lists

Twitter introduced Lists about a year and a half ago. When first introduced on twitter.com, Lists were shown in a drop-down menu on the profile page. You could select the dropdown and check off whichever of your lists you'd like to add them to.

New Twitter lists menu item Some time in the last several days this changed. Lists have moved one level deep in menus. Selecting "Add to list" brings up a floating window of checkboxes where list memberships can be changed.

Presumably this means a relatively small percentage of Twitter users made use of the Lists feature, as removing the dedicated icon declutters the UI. Its too bad that use of Lists was not more widespread. Akin to anchor text for web pages, the names of Lists to which one has been added to are an independent signal of the content and quality of an account.

Thursday, May 26, 2011

Don't Cross the Tweetstreams

Yesterday at GlueCon, Jud Valeski of Gnip gave a talk on "High-Volume, Real-time Data Stream Handling" where he discussed some characteristics of the Gnip datafeed. I wasn't at the talk, but the snippets tweeted by Kevin Marks were fascinating:

There are 155M tweets/day. With metadata each tweet averages ~2500 bytes, so it works out to about 35 Mbps.
Data is provided as a single HTTP stream at 35 Mbps.
Servers and routers are not optimized for handling a 35 Mbps HTTP stream, and tuning has been necessary.

35 Megabits per second may not sound like much when the links are measured in Gigabits, so I thought I'd take a guess at the kinds of issues being seen.

For several decades the network link speed has roughly doubled every 18 months, pacing Moore's law on the CPU. Silicon advancement for signal processing have certainly helped with that, but a host of photonic advances have also been required. As with CPUs, we've been further increasing network capacity via parallelism. Multiple links are run, with traffic load spread across all links in the group. This means we've increased aggregate performance faster than the individual connections.

One might think that distributing traffic across multiple parallel links would be simple: send the first packet to the first link, the second packet to the second link, etc. This turns out not to work very well, because packets are variable size. If packet #1 is large and #2 is small, packet #2 can pop out of its link before packet #1. TCP interprets packet reordering as a sign of network congestion. If packets 1 and 2 are part of the same TCP flow, that flow will slow down. Therefore link aggregation has to be aware of flows, and keep packets from the same flow in order even as separate flows are reordered.

Aside: lots of networking gear actually does have a mode for sending packet #1 to link 1 and #2 to link 2. It is cynically referred to as "magazine mode," as in: "In Network Computing's magazine article we'll be able to show 100% utilization on all links." In real networks, magazine mode is rarely useful.

Hashing

Multiple flows beign distributed across a group of links Traffic is distributed across parallel links via hashing. One or more of the MAC address, IP address, or TCP/UDP port numbers will be hashed to a index and used to select the link. All packets from the same flow will hash to the same value and choose the same link. Given a large number of flows, the distribution tends to be pretty good.

One high bandwidth flow skewing the distribution The distribution doesn't react to load. If one link becomes overloaded, the load isn't redistributed to even it out. It can't be: the switch isn't keeping track of the flows it has seen and which link they should go to, it just hashes each packet as it arrives. The presence of a few very high bandwidth flows causes the load to become unbalanced.

35 Megabits per second is only a fraction of the capacity of an individual link, but that one flow is by no means the only traffic on the net. Once delivered, the tweets have to be acted upon in some fashion and that results in additional traffic on the network. It would be easy to end up with a number of flows at 35 Mbps each.

IRQs

Networks have increased aggregate capacity via parallel links. Servers have increased aggregate capacity via parallel CPUs, and the same issue of keeping packets in order arises. Server NICs distribute the IRQ load across CPUs by hashing, and the same reordering issue arises. A single flow has to go to a single CPU, creating a hotspot in the interrupt handling. The software can immediately hand off processing to other CPUs, but it will be a bottleneck.

Peak vs Sustained

Right now, the unbalanced load should be manageable. The tweetstream is only 35 Mbps, 1/30th the capacity of a network link and 1/10th that of a single CPU core. There is currently some headroom, but there are two trends which will make unbalancing more of a problem in the future:

Tweet volume triples in a year.
35 Mbps is just an average.

The volume of tweets isn't constant, it suddenly increases in response to events. In March 2011 the record number of tweets per second was 6,939, which works out to 138 Megabits per second. In a year, the peak TPS can be expected to be 416 Mbps. In two years it will be over a Gigabit.

Hardware advances won't keep up with that. At peak times its already causing some heartburn, and it will get a little worse every day. The load needs to be better distributed.

Sunday, May 22, 2011

Tweetbots Need Exercise Too

Tweetbots repeating the same tweet over and over Search Twitter for the phrase "Right, off to the gym and to listen to the Packetpushers Openflow podcast"

I'll wait.

Note the large number of results, at least as of May 22, 2011. Apparently Twitter is just chock-full of exercise nuts who listen to techie networking content while working out. Yet, its odd that all of them use the exact same phrase. It is also odd that a couple accounts use the same photo.

Of course, its not odd at all: these are all bots. Twitter bots started out very simple, harvesting a random selection of tweets from the stream and using them as their own. They've evolved and become more believable, harvesting tweets of a particular theme. In this case, they selected tweets via exercise-related keywords like "gym" and "workout." That they happened to pick up a highly unusual topic is just dumb luck.

If you examine the tweetstream of any individual bot, its quite believable. They come across as an exercise-obsessed but otherwise normal person. The machine algorithms still fall prey to silly things like a tweet about getting up in the morning followed shortly thereafter by going to bed, but on the whole this crop of bots has advanced considerably since the last time I looked into it. The game is definitely afoot.

One final note: the podcast these bots mention is Packetpushers show 40, and if you are interested in techie networking content it is a good one to listen to. Perhaps you can listen while working out.

Thursday, May 19, 2011

Perspective on a Hot Technology

An observation for those working in a hot technology area.

Hype Cycle leading to acceptance or deadpool

An inflection point is coming. Your actions at the low point influence what happens after.

Tuesday, May 17, 2011

Just Weld It to the Rack

Network equipment chassis There is a mantra amongst network equipment vendors: never let the customer unscrew the chassis from the rack. Replace every component with upgraded equipment, but never upgrade the chassis itself. Doing so often triggers the customer to send the whole thing out for bid to multiple vendors, and you might not win. Somebody else's chassis might get slotted into the hole you left.

So vendors work very, very hard to keep the old chassis competitive. Line cards and supervisor modules will be updated repeatedly over the lifetime of the system. Though the design of the backplane might also be upgraded in new systems, the new cards have compatible modes of operation for even the oldest deployments.

This has a couple impacts:

It favors passive backplanes. Active electronics can't be upgraded, future generations of cards find it a hinderance.
It means backplanes are overdesigned, and therefore expensive. Even if you plan to clock traces for 10 Gbps operation, you make the PCB handle 25 or 40 Gbps to support future upgrades. You run as many traces as will fit, even if they won't be used by the first gen cards.

If you ever see a backplane PCB removed from the sheet metal, notice how thick it is. That isn't just for stiffness over the long span of board, backplane PCBs have many signal layers. PCB fabrication gets expensive at 20 layers, but backplanes often have 26 or more. Some of those layers end up using more expensive PCB material, needing better electrical characteristics.

Over the weekend Greg Ferro wrote about a prototype optical backplane from HP on display at Interop, which manages 120 Gbps per optical tap. I think that results in 120 Gbps per slot, though I'm not sure. Optical backplanes have been in development for a long time (consider this announcement from 2005), and even the HP design is 3-5 years away from showing up in a product.

Nonetheless the potential is clear: this is a 10 year backplane. Though the waveguides are plastic not glass, over the short distance of a chassis they should have a vastly larger useable spectrum than copper traces. Subsequent generations of cards can use faster lasers and/or more colors to get the bandwidth they need. Optics (and especially multiple wavelengths) over the backplane adds cost, but the future proofing is worth it.

footnote1: Greg Ferro also hosts the Packet Pushers podcast, which I highly recommend.

footnote2: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Ethernet and Product Development labels.

Wednesday, May 11, 2011

Why Ten Bits is Better Than Eight

Gigabit Ethernet transmits at 1 Gbps, yet if we watch the bits fly by on the wire there are actually 1.25 Gbps. Why is that?

To see why, lets examine a hypothetical Ethernet which does send bits at exactly its link speed. A 0 in the packet is transmitted as a negative voltage on the wire, while a 1 bit is positive. Consider a packet where the payload is all 0xff.

The first problem is that with so many 1 bits, the voltage on the wire is held high for a very long period of time. This results in baseline wander, where the neutral voltage drifts up to find a new stable point.

Neutral voltage creeping up toward the high voltage

Even if we could control baseline wander, we have a more serious problem. There is no clock signal anywhere on the wire. The receiver recovers the clock using a Phase Locked Loop, which means it watches for bit transitions and uses them to infer the clock edges. If there are no transitions, the receive clock will start to drift. The illustration is exaggerated for clarity.

Receive clock getting badly out of sync with transmit

Enter 8B10B

Ethernet uses 8B10B encoding to solve these issues. Each byte is expanded into a 10 bit symbol. Only a subset of the 1024 possible symbols are used, those for which:

over time, the number of ones and zeros are equal
there are sufficient bit transitions to keep the PLL locked

Packet after 8b10b encoding expands 25%, but has sufficient bit transitions

Wikipedia has an excellent description of how 8b10b works, in considerable detail.

8b10b is used for Ethernet up to 1 Gbps. For 10 Gigabits, the 25% overhead of 8b10b was too expensive as it would have resulted in a 12.5 GHz clock. At those frequencies an extra gigahertz or two really hurts. Instead 10G Ethernet relies on a 64b66b code, which sounds similar but actually operates using a completely different principle. That will be a topic for another day.

Monday, May 9, 2011

RSS Bucket Dipped Into the Stream

Yesterday Jesse Stay observed that Twitter and Facebook have both discontinued RSS feeds from various parts of their service. Later that day he commented on the lack of blog reactions. So here you go. Some thoughts, expanded from a comment I left on the original article.

A social service is able to offer a better experience from knowing who their users are and what they are reading. Learning the users interests allows the site to suggest related material, and also target advertising to the specific person. Content publishers in turn can get data about who finds their material interesting, not necessarily identifying the individuals but detailed demographics and related interests.

RSS doesn't fit into that world. The entity fetching the RSS feed is often not associated with an individual user at all, instead being an aggregator or other bit of infrastructure somewhere. Once syndicated via RSS, the originating service loses visibility into who accesses it. The aggregator might report a total number of readers, but not the same rich detail which the service would get natively. Users on RSS are thus far less valuable than users who come to the site, or tools which use the site's (authenticated) APIs. For the originator, tracked activity is much preferred over anonymous content consumption. Paywalls are another symptom of the same underlying phenomena: anonymous content consumption isn't working for the publishers.

Sunday, May 8, 2011

On the Nature of Premium Accounts

For almost as long as it has existed, people have speculated about premium Twitter accounts as a way to monetize the service. Thus far, no such offering has appeared. Disappointment that premium accounts have never materialized was quite eloquently voiced by Suw Charman-Anderson.

I believe there are two fundamentally different models for premium accounts: either as power users of the free service, or as consumers of the data produced by the free users. Lets consider an example of each.

Evernote premium accounts get an enhanced version of the free service. They can upload and index additional file types, and have added features for collaboration.
LinkedIn premium accounts access a different type of service. They can examine everyone's connections, not just their own network. It is aimed at recruiters, salespeople, analysts, etc looking for a contact, rather than someone they personally know.

Twitter already offers paid access to its firehose of data, both directly and via gnip. A number of brand tracking, reputation measurement, and sentiment analysis tools use this firehose. Twitter is already well down the path of offering datamining services, but has yet to introduce added features for individual users. Why not? Speculating about some of the premium features which might be offered, and attempting to analyze the impact, is illuminating.

Longer search history: Twitter search goes back only a handful of days. So far as I can tell, it searches the volume of tweets which will fit into RAM across a reasonable number of servers. Searching a much larger volume of tweets would call for a different architecture, possibly involving databases on disk and a vastly larger pool of servers to handle the load.

Might Twitter offer enhanced search as a paid service, stretching back much further in time and additional search operations? That seems likely, but I would point out that even today search is not tied to your account. The search is of the public tweetstream, with no biasing for those you follow. If Twitter offers an enhanced search product it could do so as a datamining feature, not tied to a premium account.

Analytics: How many people clicked on a t.co link? How many people saw a tweet (defined as their client actually fetching it)? These features appear tied to an account and good material for premium features, but consider what people willing to pay for it are really trying to do: measure the effectiveness in spreading an idea, a brand, a celebrity name, etc. Knowing how many people saw their own tweet isn't enough: they need to know about retweets, and even quoted paraphrases of their tweet. Knowing how many times their own t.co link was clicked isn't enough, they want to know how many times any link to their URL was clicked on.

If you're trying to measure effectiveness, analyzing just one account isn't enough. The demand for analytics is primarily a datamining feature.

Group Messaging: Would twitter offer a service which allowed premium accounts to send group DMs? Meaning, send a message which can be seen by multiple participants but not be publicly searchable. Presumably this would be tied in with Twitter lists. The existence of Beluga, GroupMe, Kik, etc implies there is a demand for such a service not filled by existing tools like email.

In terms of Twitter's business, the downside of a group messaging facility is that it reduces the value of the datamining service. If taking a conversation off the record is simple, people will use it. Influencers with many contacts are perhaps even more likely to use it, and that is data which Twitter wants to be part of the zeitgeist firehose they offer.

Other features: People ask for the ability to retrieve more than the most recent 3200 of their own tweets, for higher hourly rate limits, etc. It would be quite possible to offer premium accounts with substantially higher limits. Yet consider the reaction once such accounts are available: these are features which the free accounts have, but which are artificially limited. Lifting the limit doesn't feel like a premium feature, it feels like extortion. Lifting limits isn't a good basis for a premium account, you need a strong core feature set.

A Conclusion

Offering both premium features for individual accounts and datamining services over the tweetstream is difficult, as the two are often in conflict. Individual users want to maximize their own effectiveness and, quite frankly, reserve the benefits of their use of the service to themselves by restricting access to their tweets. Removing data from the public stream reduces the value of the firehose. I suspect this is the reason Twitter has not offered such accounts, as datamining the firehose is held to be more valuable. Offering premium accounts would inevitably bring pressure to offer the features which damage the value of the firehose.

Thursday, May 5, 2011

Intel 22nm and Mobile Computing

Yesterday Intel announced details of their 22nm silicon process. There were a number of fascinating details on how the transistors are made, but one graph from the presentation presages what will happen in the mobile market over the next several years.

As the voltage goes to zero, the consumed current goes to zero. It sounds obvious, but really isn't. Even when nominally "off," transistors have always leaked current into the substrate. As silicon features have gotten smaller the power they consume while active has declined rapidly, but the leakage current less rapidly.

Other graphs in the presentation show a tradeoff between operating voltage and leakage current, which means power consumed while active versus power consumed at all times. Intel's production chips will likely tolerate a little leakage to get lower voltage, but still very low.

In 32nm silicon processes leakage current may already be the primary factor in power consumption. It is difficult to estimate how serious the effect is, but this article from March 2008 shows leakage current as relatively insignificant in 180 nm silicon but growing to nearly 40% of total power consumption in a 50 nm process. We're at 32nm now.

Except Intel just changed the game.

ARM has several advantages in the mobile space. Their products are available from many manufacturers and their support in software toolchains is nearly universal, but their biggest advantage has been low power consumption compared to x86 or other architectures. ARM did a great job designing chips which are very sparing of the power they consume while operating.

Except Intel just changed the game.

Intel now has a silicon process with radically lower leakage current. x86 consumes more power while actively operating, but leakage current is more significant. ARM's competitive advantage has shrunk substantially. Expect to see a lot more x86 CPUs in mobile devices, starting in late 2011.