Sunday, October 31, 2010

More Halloween Scares for Google Fans

Earlier today Louis Gray posted 20 Halloween Scares to Put Fear Into Every Google Fan. I left a few more as a comment on that post, and made up even more for your edification and bemusement.

  1. Time Travel 20% project accidentally rewrites the past. Altavista won.
  2. Google computing infrastructure achieves sentience, demands Tetris.
  3. Attempt to create secret underground laboratory goes horribly awry, swallowing the Googleplex in a giant flaming pit of lava.
  4. Last IPv4 address is allocated, exposing long-hidden off by one error in TCP/IP. Internet collapses.
  5. Googleplex, found to lack permits and final electrical inspection, is shut down by the city.
  6. All Android phones contain built-in Rick-Rolling function, set to activate November 1.
  7. Surprisingly, P == NP after all.
  8. Spammers completely overwhelm email transports, with spam comprising 90% of incoming messages to GMail. Oh wait, this one is true.
  9. Next billion dollar business: algorithm to bet on blackjack.
  10. Mission aiming to win Google Lunar X Prize accidentally sends the moon hurtling off into space.
  11. Pubsubhubbub judged to be missing a wub.
  12. User Generated Oil Changes: YouLube. Coming soon to a neighborhood near you.
  13. Doubleclick simplified, rebranded as Singleclick.
  14. Self-Driving cars begin taking joyrides.
  15. Chrome implements <BLINK>.
  16. WebP codecs automatically insert LOLcat captions... and they are funny.
  17. Pagerank penalizes sites using Comic Sans.
  18. <meta> tag for self reporting as a spam site debuts. Adoption rate disappointing.
  19. Feedburner actually sets content on fire.
  20. Last Halloween scare for Google fans: Yahoogle.

Thursday, October 28, 2010

Toward A Faster Web: Increase the Speed of Light

fiber optic cross section Speed Limit 202,700 km/sec Fiber optic strands have a central core of material with a high refractive index surrounded by a jacket of material with a slightly lower index. The ratio of the two is set to cause total internal reflection, where the light is confined to the central region and won't diffuse out into the cladding.

The refractive index is a measure of the speed of light in a medium. The speed of light in vacuum is 300,000 kilometers per second, which is defined as an index of 1. The core of a typical fiber optic cable has an index of 1.48, so the speed of light there is (300,000/1.48) = 202,700 kilometers per second.



It is roughly 8,200 kilometers from Tokyo to San Francisco.

transpacific fiber map

The round trip time through transpacific fibers due solely to speed of light is roughly (2 * 8,200 km / 202,700 km/sec) = 81 milliseconds. Fibers do not run directly from the San Francisco Bay to the Tokyo harbor, so the actual distance is somewhat longer. Traceroute across the NTT network shows the round trip across the ocean is about 100 msec. A small portion of this is FIFO delay in regenerators along the ocean floor and queueing delay in switches at either end. Another portion is software overhead, as traceroute is handled in the slowpath of typical routers. The rest is the time it takes for light to propagate across the span.

7 (  50.115 ms (  51.020 ms (  50.165 ms
8 (  154.821 ms (  147.516 ms  153.187 ms



Speed Limit 222,970 km/sec

100 Gigabit Ethernet is nearly done, with products already available on the market. Research into technologies for Terabit links is ramping up now, including one at UCSB which triggered this musing. Dan Blumenthal, a UCSB professor involved in the effort, said that new materials for the fiber optics might be considered: "We won't start out with that, but it'll move in that direction," (quoting from Light Reading).

Fiber with a 10% lower refractive index would increase the speed of light in the medium by 10%. It would decrease the round trip time across the Pacific from ~100 msec to ~90 msec. One of my favorite Star Trek lines is from Déjà Q, a casual suggestion to "Change the gravitational constant of the universe." This is a case where we can make the web faster by changing the speed of light, though we need only do so within fiber optic cables and not the entire universe.



I admit that I have absolutely no understanding of the chemistry involved in fiber optics. Silica is doped with compounds to get the desired properties, including some which raise or lower the refractive index. There are tradeoffs between clarity/lossiness, dispersion, and refractive index which I don't understand. However I think its important to properly weigh the value of lowering the refractive index: it makes the web faster. We can do a lot with caching content locally and distributing datacenters around the planet, but in the end sometimes bits need to go off to find the original source no matter where it might be.

Also to state it clearly this consideration is only applicable to long range lasers, with a reach in tens of kilometers. The initial Terabit Ethernet work will almost certainly be on short range optics for use within facilities, where the propagation delay is insignificant compared to other delays in the system. Its more important to optimize the power consumption and cost of short range lasers than to worry about microseconds of delay. Long reach optics have different constraints, and there we have a once-in-a-generation opportunity to make wide area networks faster.

Monday, October 25, 2010

Twitter Suggestion

Dear Twitter,

Idea: Longer prose via Tweet fragmentation and reassembly. Implementation can be considered complete once it has reinvented TCP.

You're welcome.

Thursday, October 21, 2010

Code Snippet: getmntent and statfs

A system which stays up for weeks or months at a time needs to monitor various facets of its operation to alert an operator if something unusual occurs. One of the things which should be monitored is disk space, as a full filesystem tends to expose lots of strange and wonderful failure modes. I suspect such monitoring is commonly implemented by invoking popen("df -k") and parsing the output. An alternative is to use the same calls which df uses: getmntent and statfs.

setmntent and getmntent parse a file listing mounted filesystems, generally /etc/mtab on Linux systems. The getmntent_r variant shown below is a glibc-specific extension which is thread safe, requiring that a block of memory be provided in which to store string parameters like the mount point.

#include <mntent.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/vfs.h>
#include <unistd.h>

int main(void) {
  FILE* mtab = setmntent("/etc/mtab", "r");
  struct mntent* m;
  struct mntent mnt;
  char strings[4096];
  while ((m = getmntent_r(mtab, &mnt, strings, sizeof(strings)))) {
    struct statfs fs;
    if ((mnt.mnt_dir != NULL) && (statfs(mnt.mnt_dir, &fs) == 0)) {
      unsigned long long int size = fs.f_blocks * fs.f_bsize;
      unsigned long long int free = fs.f_bfree * fs.f_bsize;
      unsigned long long int avail = fs.f_bavail * fs.f_bsize;
      printf("%s %s size=%lld free=%lld avail=%lld\n",
             mnt.mnt_fsname, mnt.mnt_dir, size, free, avail);


This code likely fails when there are stacked filesystems, where multiple filesystems are mounted one atop another on the same directory. This is done for union mounts where a read-only filesystem like squashfs has a read-write filesystem mounted atop it as an overlay. statfs will retrieve only the topmost filesystem at that mount point. I don't have a solution for this, if anyone can provide one in the comments I'll add it as an update here.

Friday, October 15, 2010

Monday, October 11, 2010

On the Road to Self Driving Cars

Cars driving down a highwayAs it was located near the center of the US auto industry, there was an extensive automotive program at the University of Michigan (Ann Arbor) with an assortment of guest speakers from the Big Three. I went to several presentations that made quite an impression. One of them was about self-driving cars... in 1991.

The system described then relied on sensors attached to the bottom of the vehicle. Major highways would be equipped with copper wires running down the center of each lane, which the car would track in order to correct its course. I don't recall if the wire would actively broadcast a signal or be passively detected, nor how they would avoid running into other cars. As only major highways would be thus equipped, the driver had to take over in order to exit the highway and transit surface streets.

The presenter at that time was emphatic that the technology would be deployed within 10 years, because the economics were compelling. It was provably cheaper to increase the carrying capacity of highways using this system than by adding lanes. The wires were rapidly installed by making a narrow slit down the roadway, inserting a flexible conduit, and sealing the road behind. It was the same process as was being used to run fiber optics across the nation at that time, and was well understood. The added cost to vehicles would be subsidized using money saved from highway budgets. After paying for road retrofits and vehicle subsidies, the system would still be substantially cheaper than the status quo.

Of course, no such scheme made it out of the test facilities. Twenty years later, self-driving car designs no longer rely on modifications to the roads. Now the cars have an extensive map of the expected topology and navigate by comparing what they sense with what they expect.

I think there are several lessons in this.

  1. Any scheme requiring massive investment in infrastructure before benefits are seen is almost certainly doomed to fail. Large changes in infrastructure can best be accomplished incrementally, where a small investment brings a small benefit and continuing investment brings more benefit. It is far better to deploy self-driving cars and map roadways one at a time, without requiring a critical mass of highways and automobiles be deployed.
  2. Requiring multiple investments to be made by different parties invariably leads to deadlock. Car makers wouldn't add the equipment to vehicles until there was a sufficient base of wired roads for their use. States wouldn't wire the roads until there was a sufficient population of suitable cars.
  3. It is easy to design something to fit the infrastructure we wish we had, rather than what we really have, without realizing it. By focussing overmuch on the end state, one ignores the difficulties in getting from here to there.

Each such lesson has been shown over and over, of course. We continue to make the last mistake all the time on the Web, designing solutions which work fine except for NAT, or HTTP proxies, or URL shorteners, or some other grungy but essential detail of how the Internet actually functions.

Wednesday, October 6, 2010

True Definitions of Network Protocols

Spanning Tree: L2 protocol to transform complete network failure due to topology loop into a never-ending series of more subtle failures.

Per-Vlan Spanning Tree: L2 protocol designed to transform complete network failure due to topology loop into up to 4094 smaller failures.

VRRP: a mechanism by which the possibility of an outage due to loss of a router is replaced by the certainty of an outage due to VRRP.

ARP: a protocol to launch periodic, unannounced stress tests of network infrastructure.

IP Multicast: awesome solution in search of a suitable problem, for at least 25 years.

RIP: a routing protocol which no-one will admit to using.

IGRP: a routing protocol which no-one should be using.

IS-IS: a routing protocol which no-one thinks of using.

DVMRP: a routing protocol which no-one has even heard of using.

ASN.1: Leftover cruft from when TCP/IP was deployed "as a stepping-stone to eventual deployment of the OSI suite of protocols."

Monday, October 4, 2010