Thursday, July 28, 2011

ARP By Proxy

It started, as things often do nowadays, with a tweet. As part of a discussion of networking-fu I mentioned ProxyARP, and that it was no longer used. Ivan Pepelnjak corrected that it did still have a use. He wrote about it last year. I've tried to understand it, and wrote this post to be able to come back to later to remind myself.


 
Wayback Machine to 1985

ifconfig eth0 10.0.0.1 netmask 255.255.255.0

Thats it, right? You always configure an IP address plus subnet mask. The host will ARP for addresses on its subnet, and send to a router for addresses outside its subnet.

Yet it wasn't always that way. Subnet masks were retrofitted into IPv4 in the early 1980s. Before that there were no subnets. The host would AND the destination address with a class A/B/C mask, and send to the ARPANet for anything outside of its own network. Yes, this means a class A network would expect to have all 16 million hosts on a single Ethernet segment. This seems ludicrous now, but until the early 1980s it wasn't a real-world problem. There just weren't that many hosts at a site. The IPv4 address was widely perceived as being so large as to be infinite, only a small number of addresses would actually be used.

Aside: in the 1980s the 10.0.0.1 address had a different use than it does now. Back then it was the ARPAnet. It was the way you would send packets around the world. When ARPAnet was decommissioned, the 10.x.x.x address was made available for its modern for non-globally routed hosts.

Old host does not implement subnets, needs proxy ARP by router

There was a period of several years where subnet masks were gradually implemented by the operating systems of the day. My recollection is that BSD 4.0 did not implement subnets while 4.1 did, but this is probably wrong. In any case, once an organization decided to start using subnets it would need a way to deal with stragglers. The solution was Proxy ARP.

Its easy to detect a host which isn't using subnets: it will ARP for addresses which it shouldn't. The router examines incoming ARPs and, if off-segment, responds with its own MAC address. In effect the router will impersonate the remote system, so that hosts which don't implement subnet masking could still function in a subnetted world. The load on the router was unfortunate, but worthwhile.


 
Proxy ARP Today

That was decades ago. Yet Proxy ARP is still implemented in modern network equipment, and has some modern uses. One such case is in Ethernet access networks.

Subscriber network where each user gets a /30 blockConsider a network using traditional L3 routing: you give each subscriber an IP address on their own IP subnet. You need to have a router address on the same subnet, and you need a broadcast address. Needing 3 IPs per subscriber means a /30. Thats 4 IP addresses allocated per customer.

There are some real advantages to giving each subscriber a separate subnet and requiring that all communication go through a router. Security is one, not allowing malware to spread from one subscriber to another without the service provider seeing it. Yet burning 4 IP addresses for each customer is painful.


Subscriber network using a /24 for all subscribers on the switch

To improve the utilization of IP addresses, we might configure the access gear to switch at L2 between subscribers on the same box. Now we only allocate one IP address per subscriber instead of four, but we expose all other subscribers in that L2 domain to potentially malicious traffic which the service provider cannot police.

We also end up with an inflexible network topology: it becomes arduous to change subnet allocations, because subscriber machines know how big the subnets are. As DHCP leases expire the customer systems should eventually learn of a new mask, but people sometimes do weird things with their configuration.


Subscriber network using a /24 for all subscribers on the switch

A final option relies on proxy ARP to decouple the subscriber's notion of the netmask from the real network topology. I'm basing this diagram on a comment by troyand on ioshints.com. Each subscriber is allocated a vlan by the distribution switch. The vlans themselves are unnumbered: no IP address. The subscriber is handed an IP address and netmask by DHCP, but the subscriber's netmask doesn't correspond to the actual network topology. They might be given a /16, but that doesn't mean sixty four thousand other subscribers are on the segment with them. The router uses Proxy ARP to catch attempts by the subscriber to communicate with nearby addresses.

This lets service providers get the best of both worlds: communication between subscribers goes through the service provider's equipment so it can enforce security policies, but only one IPv4 address per subscriber.


Saturday, July 23, 2011

Tweetalytics

Twitter logoUntil this week I thought Twitter would focus on datamining the tweetstream rather than adding features for individual users. I based this in part on mentions by Fred Wilson of work by Twitter on analytics. I've been watching for evidence of changes I expected to be made in the service, intending to write about it if they appeared.

Earlier this week came news of a shakeup in product management at Twitter. Jack Dorsey seems much more focussed on user-visible aspects of the service, and I'm less convinced that backend analytics will be a priority now. Therefore I'm just going to write about the things I'd been watching for.

To reiterate: these are not things Twitter currently does, nor do I know they're looking at it. These are things which seemed logical, and would be visible outside the service.


Wrap all links: URLs passing through the firehose can be identified, but knowing what gets clicked is valuable. The twitter.com web client already wraps all URLs using t.co, regardless of their length. Taking the next step to shorten every non-t.co link passing through the system would be a way to get click data on everything. There is a downside in added latency to contact the shortener, but that is a product tradeoff to be made.


Unique t.co per retweet: There is already good visibility into how tweets spread through the system, by tracking new-style retweets and URL search for manual RTs. What is not currently visible is the point of egress from the service: which retweet actually gets clicked on. This can be useful if trying to measure a user's influence. An approximation can be made by looking at the number of followers, but that breaks down when retweeters have a similar number of followers. Instead, each retweet could generate a new t.co entry. The specific egress point would be known because each would have a unique URL.


Tracking beyond tweets: t.co tracks the first click. Once the link is expanded, there is no visibility into what happens. Tracking its spread once it leaves the service would require work with the individual sites, likely only practical for the top sites passing through the tweetstream. Tracking information could be automatically added to URLs before shortening, in a format suitable for the site's analytics. For example a utm_medium=tweet parameter could be added to the original URL. There might be some user displeasure at having the URL modified, which would have to be taken into account.


Each of these adds more information to be datamined by publishers. They don't result in user-visible features, and I suspect that as of a couple days ago user-visible features became a far higher priority.


footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Social label.


Monday, July 18, 2011

Python and XML Schemas

Python Logo My current project relies on a large number of XML Schema definition files. There are 1,600 types defined in various schemas, with actions for each type to be implemented as part of the project. A previous article examined CodeSynthesis XSD for C++ code generation from an XML Schema. This time we'll examine two packages for Python, GenerateDS and PyXB. Both were chosen based on their ability to feature prominently in search results.

In this article we'll work with the following schema and input data, the same used in the previous C++ discussion. It is my HR database of minions, for use when I become the Evil Overlord.

<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="minion">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="rank" type="xs:string"/>
      <xs:element name="serial" type="xs:positiveInteger"/>
    </xs:sequence>
    <xs:attribute name="loyalty" type="xs:float" use="required"/>
  </xs:complexType>
</xs:element>

</xs:schema>


<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<minion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:noNamespaceSchemaLocation="schema.xsd" loyalty="0.2">
  <name>Agent Smith</name>
  <rank>Member of Minion Staff</rank>
  <serial>2</serial>
</minion>

The Python ElementTree can handle XML documents, so why generate code at all? One reason is simple readability.

Generated CodeElementTree
m.name m.find("name").text

A more subtle reason is to catch errors earlier. Because working with the underlying XML relies on passing in the node name as a string, a typo or misunderstanding of the XML schema will result in not finding the desired element and/or an exception. This is what unit tests are supposed to catch, but as the same developer implements the code and the unit test it is unlikely to catch a misinterpretation of the schema. With generated code, we can use static analysis tools like pylint to catch errors.


 

GenerateDS

The generateDS python script processes the XML schema:

python generateDS.py -o minion.py -s minionsubs.py minion.xsd

The generated code is in minion.py, while minionsubs.py contains an empty class definition for a subclass of minion. The generated class uses ElementTree for XML support, which is in the standard library in recent versions of Python. The minion class has properties for each node and attribute defined in the XSD. In our example this includes name, rank, serial, and loyalty.

import minion_generateds
if __name__ == '__main__':
  m = minion.parse("minion.xml")
  print '%s: %s, #%d (%f)' % (m.name, m.rank, m.serial, m.loyalty)

 

PyXB

The pyxbgen utility processes the XML schema:

pyxbgen -u minion.xsd -m minion

The generated code is in minion.py. The PyXB file is only 106 lines long, compared with 548 lines for GenerateDS. This doesn't tell the whole story, as the PyXB generated code imports the pyxb module where the generateDS code only depends on system modules. The pyxb package has to be pushed to production.

Very much like generateDS, the PyXB class has properties for each node and attribute defined in the XSD.

import minion_pyxb
if __name__ == '__main__':
  xml = file('minion.xml').read()
  m = minion.CreateFromDocument(xml)
  print '%s: %s, #%d (%f)' % (m.name, m.rank, m.serial, m.loyalty)

 

Pylint results

A primary reason for this exercise is to catch XML-related errors at build time, rather than exceptions in production. I don't believe unit tests are an effective way to verify that a developer has understood the XML schema.

To test this, a bogus 'm.fooberry' property reference was added to both test programs. pylint properly flagged a warning for the generateDS code.

E: 15: Instance of 'minion' has no 'fooberry' member (but some types could not be inferred)

pylint did not flag the error in the PyDB test code. I believe this is because PyDB doesn't name the generated class minion, instead it is named CTD_ANON with a runtime binding within its framework to "minion." pylint is doing a purely static analysis, and this kind of arrangement is beyond its ken.

class CTD_ANON (pyxb.binding.basis.complexTypeDefinition):
  ...

minion = pyxb.binding.basis.element(pyxb.namespace.ExpandedName(Namespace,
           u'minion'), CTD_ANON)

 

Conclusion

As a primary goal of this effort is error detection via static analysis, we'll go with generateDS.


Saturday, July 16, 2011

Billions and Billions

In March, 2010 there were 50 million tweets per day.

In March, 2011 there were 140 million tweets per day.

In May, 2011 there were 155 million tweets per day.

Yesterday, apparently, there were 350 billion tweets per day.

350 million tweets/day would have been an astonishing 2.25x growth in just two months, where previously tweet volume has been increasing by 3x per year. 350 billion tweets/day is an unbelievable 2258x growth in just two months.

Quite unbelievable. In fact, I don't believe it.

350 billion tweets per day means about 4 million tweets per second. With metadata, each tweet is about 2500 bytes uncompressed. In May 2011 the Tweet firehose was still sent uncompressed, as not all consumers were ready for compression. 4 million tweets per second at 2500 bytes each works out to 80 Gigabits per second. Though its possible to build networks that fast, I'll assert without proof that it is not possible to build them in two months. Even assuming good compression is now used to get it down to ~200 bytes/tweet, that still works out to an average of 6.4 Gigabits per second. Peak tweet volumes are about 4x average, which means the peak would be 25 Gigabits per second. 25 Gigabits per second is a lot for modern servers to handle.

I think TwitterEng meant to say 350 million tweets per second. Thats still a breaktaking growth in the volume of data in just two months, and Twitter should be congratulated for operating the service so smoothly in the face of that growth.


Update: Daniel White and Atul Arora both noted that yesterday's tweet claimed 350 billion tweets delivered per day, where previous announcements have only discussed tweets per day. That probably means 350 billion recipients per day, or the number of tweets times the average fanout.


Update 2: In an interview on July 19, 2011 Twitter CEO Dick Costolo said 1 billion tweets are sent every 5 days, or 200 million tweets per day. This is more in line with previous growth rates.


Wednesday, July 13, 2011

Essayists and Orators

Recently Kevin Rose redirected his eponymous domain to his Google+ profile, reflecting that "G+ gives me more (real-time) feedback and engagement than my blog ever did." Earlier this year Steve Rubel deleted thousands of blog posts from older TypePad and Posterous sites, and started afresh on Tumblr.

Moving the center of one's online presence to "where the action is" is not a new phenomena. In 2008 Robert Scoble essentially abandoned his own sites in order to spend time on Friendfeed, the hot new social networking site at that time. Techcrunch even attempted an intervention over the move. After the Facebook acquisition of FriendFeed the site gradually decayed through benign neglect. Scobleizer moved on long ago.

Why do this? Surely its better to own your own domain and control your destiny? Or is it.


 
Essayists And Orators

In this discussion we'll focus on people who are online for more than just casual interaction or journaling, who have specific goals they are trying to accomplish with their online presence.

Essayists publish thoughtful prose, focussed on a particular topic. Presentation and style is important, but generally secondary to the density of ideas within. The product of their labor comes slowly, and is intended to stand for considerable time.

Orators can also deliver thoughtful ideas and spend considerable time preparing for it, but the dynamics are very different. The pace is faster, the interaction more frequent with less time to consider. The delivery and ideas can be adjusted over time, with each new presentation.

Translated to their online equivalents, I think we can still recognize the Essayist and Orator archetypes based on what they want people to find when they search. The world is a larger place now, when we want to know something outside of our knowledge we search for it.

For an Essayist, the desired result is a post with thoughts on the topic, linked to their name. For an Orator, the desired result is a conclusion that the orator is knowledgeable about the topic.

For an Essayist its important to keep material available for people to find, and in a form which links back to the author. Considerable effort has been spent to provide value up front. If someone needs more they can contact the author, who can provide additional help freely or with suitable compensation. Hosting on one's own site allows the linking of authorship to original material, and provides a stable contact point.

For an Orator, its more important that people find the author's name as someone knowledgeable about the topic. An Orator seeks contact much earlier in the process than an Essayist. They want a followup search to be for their name, to find out how to contact them. This desire for contact earlier in the process implies that the Orator will interact freely on many topics. At some point, if the searcher becomes convinced they can benefit from the Orator's expertise, they may discuss terms for further help.

For an Orator, its less important to have a stable presence online. The desired result is for someone to seek them out personally, and even if they move from one site to another search engines can be depended on to find their most recent incarnation.

I suspect this categorization paints with too broad a brush, as no one corresponds exactly to either archetype, but I'm finding it useful to consider.

Pictures of Abraham Lincoln and Frederick Douglas.

Lincoln and Douglas pictures courtesy Wikimedia Commons. Both are in the public domain in the United States.


footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Social label.


Tuesday, July 12, 2011

CodeSynthesis XSD Data Binding

Nowadays I make a habit of writing up how to use particular tools or techniques for anything which might be useful to reference later. Many techniques I worked on before starting this practice are now lost to me, locked away in proprietary source code at some previous employer.

This post concerns data binding from XML schemas in C++, generating classes rather than manipulating the underlying XML. As its written for Future Me, it might not be so interesting to those who are not Future Me.

Consider the simple XML schema shown below. I aspire to be the Evil Overlord, and am working on the HR system to keep track of my innumerable minions.

<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="minion">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="rank" type="xs:string"/>
      <xs:element name="serial" type="xs:positiveInteger"/>
    </xs:sequence>
    <xs:attribute name="loyalty" type="xs:float" use="required"/>
  </xs:complexType>
</xs:element>

</xs:schema>

It would be possible to parse documents created from this schema manually, using something like libexpat or Xerces. Unfortunately as the schema becomes large, the likelihood of mistakes in this manual process becomes overwhelming.

I chose instead to work with CodeSynthesis XSD to generate classes from the schema, based mainly on the Free/Libre Open Source Software Exception in their license. This project will eventually be released under an Apache-style license, and all other data binding solutions I found for C++ were either GPL or a commercial license.


 
Parsing from XML

The generated code provides a number of function prototypes to parse XML from various sources, including iostreams.

std::istringstream agent_smith(
  "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\" ?>"
  "<minion xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" "
  "xsi:noNamespaceSchemaLocation=\"schema.xsd\" loyalty=\"0.2\">"
  "<name>Agent Smith</name>"
  "<rank>Member of Minion Staff</rank>"
  "<serial>2</serial>"
  "</minion>");
std::auto_ptr m(NULL);

try {
  m = minion_(agent_smith);
} catch (const xml_schema::exception& e) {
  std::cerr << e << std::endl;
  return;
}

The minion object now contains data members with proper C++ types for each XML node and attribute.

std::cout << "Name: " << m->name() << std::endl
          << "Loyalty: " << m->loyalty() << std::endl
          << "Rank: " << m->rank() << std::endl
          << "Serial number: " << m->serial() << std::endl;

 
Serialization to XML

Methods to serialize an object to XML are not generated by default, the --generate-serialization flag has to be passed to xsdcxx. This emits another series of minion_ methods, which take output arguments.

int main() {
  minion m("Salacious Crumb", "Senior Lackey", 1, 0.9);
  minion_(std::cout, m);
}

This sends the XML to stdout.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<minion loyalty="0.9">
  <name>Salacious Crumb</name>
  <rank>Senior Lackey</rank>
  <serial>1</serial>
</minion>

Codesynthesis relies on Xerces-C++ to provide the lower layer XML handling, so all of the functionality of that library is also available to the application.

Thats enough for now. See you later, Future Me.


Friday, July 8, 2011

Hop By Hop TCP

Last week discussed how Ethernet CRCs don't cover what we think they cover. Surely the TCP checksum, as an end to end mechanism, provides at least a modicum of protection?

Unfortunately in today's Internet, no it doesn't. The TCP checksum is no longer end to end.

Path across the Internet showing every switch and link in green, protected by the TCP checksum.

Our mental model has the client sending packets all the way to the server at the other end. In reality there is a huge variety of gear which spoofs the TCP protocol or interposes itself in the connection, and the checksum is routinely discarded and regenerated before making it to the server. We'll look at some examples.


 
Load Balancers

Path across the Internet to the load balancer, which is shown in red. That is where the TCP checksum stops.In a typical datacenter, the servers sit behind a load balancer. The simplest such equipment distributes sessions without modifying the packets, but the market demands more sophisticated features like:

  • SSL offload, allowing the servers to handle unencrypted sessions
  • Optimized TCP Window for clients on broadband, dialup, or wireless.
  • Allow jumbo frames within the data center, and normal sized frames outside.

All of these features require the load balancer to be the endpoint of the TCP session from the client, and initiate an entirely separate TCP connection to the server. The two connections are linked in that when one side is slow the other will be flow controlled, but are otherwise independent. The TCP checksum inserted by the client is verified by the load balancer, then discarded. It doesn't go all the way to the server it is communicating with.


 
WAN Optimization

Path across the Internet via WAN optimizers, which are shown in red. That is where the TCP checksum stops.High speed wide area networks are expensive. If a one-time purchase of a magic box at each end can reduce the monthly cost of the connection between them, then there is an economic benefit to buying the magic box. WAN optimization products reduce the amount of traffic over the WAN using a number of techniques, most notably compression and deduplication.

The box watches the traffic sent by clients and modifies the data sent across the WAN. At the other end it restores the data to what was originally sent. In some modes it will carry the TCP checksums across the WAN to re-insert into the reconstructed packets. However the gear typically also offers features to tune TCP performance and window behavior (especially for satellite links), and these result in modifying the packets with calculation of new checksums.


 
NAT

NAT involves overwriting the IP addresses and TCP/UDP port numbers, which impacts the TCP checksum. However NAT is somewhat special: it is purely overwriting data, not expanding it. Therefore it can keep the existing checksum from the packet, subtract the old value from it, and add the new. As this is less expensive than to recalculate the checksum afresh, most NAT implementations do so.

Thus though the checksum is modified, most NAT implementations don't have the potential to corrupt data and then calculate a fresh checksum over the corruption. Yay!


 
HTTP Proxies

Path across the Internet through an HTTP proxy, which is shown in red. That is where the TCP checksum stops.Organizations set up HTTP proxies to enforce security policies, cache content, and a host of other reasons. Transparent HTTP proxies are setup without cooperation from the client machine, simply grabbing HTTP sessions which flow by on the network. For the web to work the proxy has to modify the data sent by the client, if only to add an X-Forwarded-For header. Because the proxy expands the amount of data sent, it ends up repacking data in frames. Therefore proxies generally calculate a new checksum, they can't just update it the way NAT boxes do.


 
Conclusion

The Internet has evolved considerably since TCP was invented, and our mental model no longer matches reality. Many connections end up repeatedly recalculating their checksum via a proxy at their own site, a WAN optimizer somewhere along the way, and a load balancer at the far end. The checksum has essentially been turned into a hop by hop mechanism.

This isn't necessarily bad, and it has allowed the Internet to expand to its current reach. The point of this post and the earlier musing is simply that the network cannot be depended on to ensure the integrity of data. End to end protection means including integrity checks within the payload.


footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Ethernet label.


Thursday, July 7, 2011

rel="twitter me"

Apparently Alyssa Milano used her Twitter account to verify her Google+ account.

Alyssa Milano tweet: this is my real Google+ account

XFN 1.1 defined a number of relationship attributes for links, one of which is rel="me"

rel="me" A link to yourself at a different URL. Exclusive of all other XFN values. Required symmetric. There is an implicit "me" relation from the contents of a directory to the directory itself.

Clearly, we need rel="me twitter" for situations like this.

Tuesday, July 5, 2011

On the Naming of Our Relationships

I've never been happy with the use of a single word to describe associations on a social network, whether that word is "friending" or "following." Human relationships have a huge range of possibilities, and we use subtle variations in wording and adjectives to add nuance to our descriptions of them. To take just one example, "godmother," "stepmother," and "mother" all convey a beloved relationship, yet with vastly different levels of parental (and biological) involvement.

On Twitter I've tried to use Lists to broadly categorize those I follow into groups, mostly focussed on topics. An account can have at most twenty lists, so I've tended to be sparing in creating them. As Twitter has recently de-emphasized lists in the UI, I don't anticipate much more development there.

In the past week I've spent a lot of time on the Google+ Field Trial. There are a number of things I like about it, but one favorite is that I get to name the Circles I use. I can choose the terminology to define our relationship, how I see it from my own perspective.

Friends, BFFs, Family, Extended Family circles.

I also use circles to focus on particular topics, or on groups I am associated with. I rarely post to these circles, mostly just read.

FriendFeeders, Googlers, Journalistas, Networking, Scobleizers.

Its very liberating to be able to put a name on one's associations.


footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Social label.


Friday, July 1, 2011

Statistical Transmogrification

Machine translation has improved so much that we can now joke about it, pointing and laughing when it comes up with something particularly inane. I do mean "improved so much" - ten years ago the results on arbitrary text were so incomprehensible that they weren't even amusing. Statistical machine translation is a big data approach to the problem, looking for statistical correlations in the way humans translate between languages. It radically improved the results compared to trying to get the machine to understand grammar, at least when there is sufficient data available.

Which brings us, of course, to Star Trek and the Universal Translator. As a plot device, the translator is essential: you either portray alien species as inexplicably speaking human languages, or you employ a magical device to learn the alien language and translate. One of my favorite episodes revolves around the limitations of the device: Darmok. The translator can translate works with individual words and phrases, but cannot translate metaphor and cultural references.

"Darmok and Jalad at Tanagra."
"Kiteo, his eyes closed."
"Shaka, when the walls fell."
"Zinda, his face black, his eyes red!"
"Mirab, with sails unfurled."
"The beast at Tanagra."

Ever notice how many cultural references we use in everyday conversation, as a shorthand to convey deeper meaning in a small number of words? I tried to keep track of them for a week. Its difficult to even take note of the ones you employ yourself: the mind doesn't categorize them as being special, they're just another part of the language. They are mostly noticeable when someone else uses an unfamiliar reference that you really have to think about.

"Multiplying like a wet gremlin."
"It's my precious."
"Don't cross the streams."
"Use the carrot, not the stick."
"I drink your milkshake."
"He was the red shirted ensign of that project."
"That is Kryptonite to her."
"He has a portrait up in the attic getting older and older."

An interesting thing about statistical translation is that it is even able to handle references like these, if they are sufficiently common. Its looking for correlation, not meaning. If humans can come up with a reasonable translation for a cultural reference, then the machine will as well.

This doesn't help Picard, though: no data corpus to work with.