Coding Relic

Monday, June 28, 2010

Toxoplasmowhosit?

Toxoplasmosis has long been known to modify the behavior of its host mice, influencing them in ways which make it more likely they will be eaten by a cat. The microbe reproduces in the intestines of felines. The chemicals it produces to effect this appear to also have an impact in other mammals, including humans. As cats do not normally eat humans, effects on humans are just an accident.

Natural selection will eventually produce a toxoplasmosis microbe which forces humans to take felines into their homes and care for them, producing a much larger population of suitable environments for their reproduction. When this happens, when large numbers of humans willingly share their homes with cats, then we will know that the microbes have taken over.

Thursday, June 24, 2010

Virtual Trouble

After many years of working in plain C, I'm back to writing C++. I feel like an unfrozen caveman, confused by the flashing lights of the big city. Here is something I ran into recently.

#include <stdio.h>

class BaseClass {
 public:
  BaseClass() { InitName(); }
  virtual void InitName() { name_ = "BaseClass"; }
  char *name_;
};

class SubClass : public BaseClass {
 public:
  virtual void InitName() { name_ = "SubClass"; }
};

int main(int argc, char** argv) {
  BaseClass base;
  SubClass sub;

  printf("BaseClass name_ = %s\n", base.name_);
  printf("SubClass  name_ = %s\n", sub.name_);
}

A base class provides a virtual InitName() method, and calls it from the constructor. A subclass overrides InitName(), yet the overridden method is not called during construction. The BaseClass InitName() is used instead.

$ ./a.out
BaseClass name_ = BaseClass
SubClass  name_ = BaseClass

Why?

A Maze of Twisty Little Passages

Objects are constructed from the most basic class first. When the BaseClass() constructor runs, the SubClass methods and member variables have not yet been initialized. The object is a BaseClass object at that point. When BaseClass::BaseClass() returns, the object will be remarked as a SubClass object, and only then will its overridden methods actually do anything. Destructors work similarly. The outermost derived class is destroyed first, and by the time BaseClass::~BaseClass() runs the object will be of BaseClass type. Any virtual methods called from ~BaseClass() will call the BaseClass definition.

Scott Meyers Effective C++, Third Edition

Scott Meyers' Effective C++, 3rd Edition devotes a chapter to this topic, with considerably more detail. That chapter happens to be available online in an excerpt by the publisher.

For my specific issue, my object already had an Init() method to be called after object construction. It was straightforward to move the functionality from the constructor to Init(), with some checks to make it do something sensible if the caller neglects to call Init().

Monday, June 21, 2010

Stackoverflow Maintenance Page

I love how the stackoverflow.com maintenance page show a binary dump of a Windows process where the stack overflow error is defined. Clever.

Thursday, June 17, 2010

A Moon of Endor

At the time of this writing 455 planets outside of Earth's own solar system have been discovered. Nearly all are gas giants like Jupiter and Saturn, and even the smallest are several times the mass of Earth. This doesn't mean smaller planets are uncommon, it means our current techniques using optical occlusion and gravitational deflection are far better at detecting massive planets.

When I read reports on these discoveries they have already been "dumbed down" for a mainstream audience. Invariably the lack of Earth-like planets is mentioned, followed by a reference to extraterrestrial life on those Earth-like planets. Yet I suspect that if we're really interested in finding planets where life is likely to have evolved, gas giants are what we should be looking for.

In our own solar system there are three "Earth like" planets: Venus, Earth, and Mars. Of the three, only Earth is tectonically active with a strong magnetic field. Tectonics and vulcanism leads to temperature variation in the environment, which on Earth appears to spur evolution. A strong magnetic field protects the planet's surface from solar flares.

Something about Earth is different, resulting in it having a highly dynamic molten core where Mars and Venus are far more settled. One possibility is the Moon: Earth has a relatively enormous moon compared to Mars. The force of lunar gravity exerts considerable strain on the planet, and perhaps that keeps the inner dynamo from settling. Another possibility is related to the formation of the moon: if indeed it formed due to a massive asteroid strike on the Earth ejecting a huge volume of material into space, then perhaps the planet simply hasn't settled down yet.

The moons of a gas giant have some of the same properties as Earth. The massive gravity of their neighbor exerts considerable force, making a dynamic molten core more likely. If their orbit is close enough they also sit inside their host's magnetic field, protecting them from solar wind and flares. There is a considerable amount of radiation near a gas giant, but its a constant level which becomes part of the environment. On Earth life seems able to evolve in extremely harsh environments, so perhaps life can evolve on a gas giant moon in spite of the radiation. In our own solar system it is possible that life exists on Titan, which would be incredibly exciting.

Advancements in our ability to detect Earth-like exoplanets is interesting, but to me it will be far more interesting when we can detect moons orbiting gas giants.

Monday, June 14, 2010

Expect the Unexpected Error

Well, sure. If were it an expected error you would have done something more useful.

Wednesday, June 9, 2010

4k Sectors Approacheth

Its amazing that hard drives work at all. A tiny little drive head flies just above a metallic tundra, manipulating miniscule dots of magnetism flying by at high speed. The dots have gotten small enough that advances in materials science are required to reliably detect the field.

As an industry, drive manufacturers have done a remarkable job in advancing the technology without breaking compatibility. For example when drives added LBA48 to support larger than 128 Gigabytes, the older LBA28 commands were retained without modification. New drives could be put into existing LBA28 controllers without trouble in the common cases. No more than 128 GB would be used, but older controllers did not stop working the instant LBA48 came out. It allowed an orderly transition to newer designs.

We're on the verge of the next big transition: 4k sectors. For 30 years hard disk drives have used a 512 byte sector. I'm not sure of the original motivation for that specific size, though I suspect the VAX page size of 512 bytes was a factor. The drive industry begin preparing for a transition to 4 Kilobyte pages nearly ten years ago, and the first products are now on the market.

Anatomy of a Disk Sector

Disks with 512b sectors currently allocate about 40 bytes of additional space for ECC. Thus the error correction occupies ~8% of the raw capacity of the disk. The density of bits on the platter continues to increase, while imperfections in the drive media tend to remain the same size. As more bits are packed into the same area a media flaw will affect a larger span, and require more ECC to recover. If drives stick with 512 sectors, one can see the day coming when ECC will consume unacceptable fractions of the disk: 20%, 30%, etc. Therefore the drive industry is moving to 4 kilobyte sectors, which amortize the ECC across larger swaths of data. Where a 512 byte sector uses 40 bytes of ECC, a 4096 byte sector requires about 100 bytes. Eight times more data is covered with only 2.5x more ECC.

There are several other sources of overhead for each sector, including a synchronization region at the beginning (to prepare the read head to deserialize the data) and a gap between sectors. I do not know the size of these, but they should remain the same even as they amortize over 8x more data. These are a smaller win, but worth mentioning.

As with previous technology transitions the drive will continue to accept the older commands for 512 byte sector accesses, transparently performing a read-modify-write to the enclosing 4096 byte sector. The first time such a sector is accessed will be relatively expensive: the drive head cannot read and write simultaneously, it must first read in the full 4096 bytes and then allow a complete rotation of the platter before it can write the modification back. All modern drives contain 32 or 64 MBytes of cache, subsequent sub-sector writes can merge from cache to write directly to the platter.

Most processor architectures and OS implementations use a page size of 4K or larger, and almost always write full pages to disk. No read+modify is needed if the entire 4K sector is being written. There is a caveat to this happy outcome: the OS page needs to start at the 4K sector boundary, which really means the disk partition needs to start at a 4K aligned boundary. If it doesn't, then even 4k writes will still turn into read-modify-write cycles.

According to Western Digital, Windows versions starting with Vista and all recent versions of MacOS X and Linux align their partitions to a multiple of 4K Bytes. Windows XP and earlier generally did not: the Master Boot Record ended at sector 63, and all subsequent partitions would be laid out one sector off from 4K alignment. If a subsequent partition was itself not a multiple of 4K, it would throw off the alignment of the partitions which follow it.

Embedded systems should also be on the list of potential problem areas for the 4k sector size: DVRs, security camera monitoring systems, various consumer electronics, etc. Its quite common to design a product, use existing tools like fdisk.exe to create a "golden software image," and bit-copy that image to the hard drive. If the fdisk of the day did not align its partitions, then the image won't have them aligned. Periodically as components are discontinued new substitutes have to be qualified. Qualification of new commodity components is often left to the contract manufacturer, the engineering team may not be involved at all. As hard drive models come and go relatively frequently, a design will see several different drive models through its production lifetime.

In this particular case, its worthwhile for the engineering team to be proactive and not leave it up to the CM. A 4K sector drive will work: the software will boot and operate. Only the performance is impacted. Its quite conceivable for the CM to finish a change order for a new drive and ship a significant amount of product before the performance issues are noticed, if the problem is subtle.

WD has two solutions if unaligned writes are a problem:

A jumper on the drive can add one to all 512B sector numbers.
WDAlign.exe can re-image an existing installation to align the partitions.

If your existing product happens to have its partitions all off by one sector, presumably because an older Windows fdisk.exe was used to create it, the jumper is a potential solution. There is no telling how long drive manufacturers will keep the jumper in their products, of course. If the existing golden image has misaligned partitions, its time to start working on a new image. This should be a matter of changing the partition table without having to touch the binaries. A QA cycle would be needed, checking for regression.

If the partitions are misaligned, the product accesses the raw disk devices, and it avoids using a partition table to "improve performance" or some other reason... you're screwed. Start a project to update the design, and don't hard-code sector numbers next time. Native 512b drives will be available for a while, which may provide enough time to re-engineer.

Random Postscript

This kind of change in block size has happened once before, by my recollection. Many very early CD-ROM drives used a 512 byte sector size, matching that of hard drives. Sometime in the early 1990s CD drives changed to a 2048 byte sector, which they still use today. A number of drives had jumpers to switch between the two sizes, and I recall Sun workstations of the time being unable to boot from a 2048 byte sector.

Monday, June 7, 2010

We Still Have Unlimited SMS

AT&T recently announced changes to their data plans, eliminating the unlimited data plans and replacing them with tiered plans offering up to 2GB of data per month. Yet AT&T still offers unlimited messaging, including SMS and MMS. Thus, there is a mechanism to game the system.

IP via SMS

SMS relies on the GSM 03.38 character set, which is 7 bits but using an escape code for additional characters. UTF-16 is used for non-Latin alphabets, and an 8 bit clean data alphabet is also supported (but not required).

For widest support we should rely on GSM 03.38, the only character encoding which handsets are required to implement. TCP/IP packets can be encoded into at most 160 characters of 7 bits each, for a total of 1120 bits or 140 bytes. An <ESC> can be followed by only a few valid characters, so an <ESC> followed by any arbitrary character might be rejected at some point through the delivery path. An HDLC character-stuffing technique can be used to eliminate <ESC> from the data stream, details to be worked out later.

SMS does define a concatenated option to support longer messages, but it occupies 7 characters in the payload. As a compressed IP header would be smaller than 7 characters, it is recommended that SMS concatenation not be used. A 140 byte packet MTU will pack more efficiently than a larger MTU plus concatenation overheads.

With such a small MTU, header compression is a must. Robust Header Compression defines profiles for TCP+IP, UDP+IP, and RTP+UDP+IP compression, which can reduce the typical header stack down to about 3 bytes. For deployment use we'll need to define a new ROHC profile for IP Fragments, an RFC will be drafted later.

IP via MMS QR code

MMS can be used to send small pictures and movies. Initially this seemed very promising: wrap the IP packet inside a JPG header and send it. Unfortunately to save cost, most mobile operators re-sample and re-encode any images sent via MMS to reduce their size. Straight data inside an image payload would be destroyed.

Instead the IP packet should be encoded as an image of a QR code, which can be robustly decoded even if the image is resampled. QR Code also includes error correction, helpful in this application.

QR Code can handle over 2 KBytes of data in a single barcode, easily enough to handle the maximum 1.5K ethernet frame size. ROHC header compression is not required.

Closing Notes

Yes, this is a joke. Material posted on Mondays is not intended to be taken seriously. After all, nothing productive happens on Monday.

Friday, May 21, 2010

Hiatus

For the next month or so expect postings here to be infrequent. I'm more likely to post short snippets on Twitter or Google Buzz during this time.

Monday, May 17, 2010

/dev/tty

/dev/tty	==

	Any questions?

Friday, May 14, 2010

Uncanny Friending

There is an urban legend that Eskimos have many different words for snow. The truth is the Aleut languages have about as many words for snow as does English, but allow descriptive suffixes to be attached to any word to form countless variations.

Consider the English words we use to describe human relationships, and the distinctions they convey in meaning:

sister	stepsister	half sister
significant other	fiancée	spouse
friend	just friends	friend with benefits
peer	coworker	colleague
mother	stepmother	godmother

We use adjectives to add huge amounts of information in a single word. "fiancée" conveys one meaning, that of a beloved person. "current fiancée" conveys an entirely different meaning, a disposable relationship given a label for convenience.

Now consider the words we use to describe relationships in social networks:

friend	friend	friend
friend	friend	friend
friend	friend	friend
friend	friend	friend
friend	friend	friend

Why do we find this unsatisfying? I believe it is a corollary to the Uncanny Valley effect in robotics and computer games: "friend" is close enough to the real description of the human relationship that we find it unsettling. If the term were more inhuman, less shaded with meaning, it would not be so maddening.

The term "like" has a similar problem: who wants to like something unpleasant or unsavory? Clicking "like" is meant is to express interest, but the terminology is close enough to the real intention to be maddeningly imprecise.

I also suspect this vaguely unsettling feeling will resolve itself in a few more years online: the words friend and like will simply lose all meaning. We'll know this has been achieved when people stop using air quotes to distinguish online friending versus real life friends.

This genesis of this musing came via an insightful tweet by Marshall Kirkpatrick:

told my wife that google "results from your social circle" showed me because we are friends. she insists we are more than that. true :) less than a minute ago via TweetDeck Marshall Kirkpatrick
marshallk

Wednesday, May 12, 2010

Death of Copper Predicted. Film at 11.

copper RJ45 and fibers held in a hand Every handful of years we ratchet up the Ethernet link speed: from 10 Mbps to 100 Mbps in the early 1990s, to 1 Gbps in the mid 1990s, to 10 Gbps in the early part of this century. 40 Gbps is the next target. At the 1 Gbps and 10 Gbps transitions naysayers maintained that copper cables would never be able to meet the required signaling rates and that optical would prevail. The same doubt is now being voiced about 40 Gbps.

During the 1 Gbps and 10 Gig transitions, optical media became available several years before copper, and then the initial 10 Gig copper specs were limited to patch cable distances of 10-15 meters. 40G will repeat the story with optical products already available, substantially before copper. Nonetheless I'd wager 40G copper transceivers will eventually appear in some form.

Yet this time, optical will win. Not because of the technology or limitations of copper wire, but because of economics. Economics used to be in copper's favor: simple install and no expensive lasers. Copper could ride the silicon technology curve, throwing ever more DSP power at the problem. Times have changed: cat6a and cat7 cabling is as difficult and expensive to install as fiber, and solid state laser components allow optical transports to ride the silicon technology curve.

Like fiber, cat6/7 cables have a minimum bending radius. Pull too tight and the cable can no longer handle long distances.
Like fiber, cat7 does not tolerate being stretched. Stretch a 100m cable by a centimeter and its performance suffers.
Even padded cable staples put too much pressure on the cable. cat7 must run in a tray or conduit, and the bulky shielding means fewer of them will fit.
cat7 cables are very sensitive to connectorization. The crimp tool you used for cat5e won't do.

The other problem with copper cables is that they are made of copper, an actively traded commodity. The chart below shows the raw material cost of copper over the last century, normalized to the US Dollar in 1998. During much of the late 1990s and early 2000s copper was cheap by historic standards. In the last few years the commodity price has trended back up due to demand, without a matching increase in new supply. If there is a natural ceiling for copper pricing where the market will seek alternatives, we do not appear to have hit it yet.

Price of copper since 1900 in 1998 dollars

(data source: US Geological Survey)

I'm not predicting that 40 Gig copper transceivers will be impossible. On the contrary, I suspect there will be two solutions brought to market: a very short reach spec using RJ45 patch cables, and a 100m spec which imposes more painful requirements like cat7a/cat8, use of multiple cables, and electrically better connectors (presumably also manufactured, not connectorized on site). These products will eventually appear, substantially lagging optical product availability.

I simply suspect that the economics no longer work in coppers favor: patch cables from one side of the rack to the far corner will be long enough to have to worry about install quality. If the pressure from zip-ties fastening the cable to the rack threaten the operation of your network, you're better off using fiber.

The genesis of this post came as a comment on Stephen Foskett's excellent Pack Rat blog. It is an excellent resource, highly recommended.

Wednesday, May 5, 2010

Privacy vs Voyeurism

Much has been written about privacy online. When Pandora reveals our friend's music tastes it makes us slightly uncomfortable, even if we enjoy the new music suggestions which result. When our friends can unknowingly reveal information about us, we find it disturbing. Facebook privacy currently dominates the discussion, but the trend of all online activity has been more sharing and less privacy.

I use foursquare, which allows friends on that service to see your location when you checkin. Earlier this week a friend checked in to the Lucile Packard Children's Hospital.

A checkin notification is devoid of context; there was no indication if it was routine or emergency. Certainly if one had just rushed a child to the hospital one wouldn't bother checking in... but what about hours later? What about an extended stay, after initial panic subsides? Where detail is lacking, the mind fills in possibilities. After thinking about it for a while, worry overcame reservation and I sent email asking if there was anything I could do to help.

As it happens, the visit was completely routine.

It felt weird, asking if everything was ok. I was acting on the basis of information which even just a couple years ago would not have been available to me. Back then I would only have known if he'd informed me directly, and in that context asking if I could help wouldn't have seemed even slightly awkward.

Even if it hadn't been a routine visit, even if there had been help I could provide, reaching out on the basis of a foursquare checkin would have still felt weird. Why is that? I think it is a form of guilt, as using social media in this way feels a bit like voyeurism. In this case it was information the person had chosen to share by explicitly checking in on foursquare, but down in the subconscious it is still equated to clandestine spying.

As online privacy recedes, I think we're all going to be experiencing this feeling more often.

Thoughts on Sharing

Society does not inherently guarantee our privacy. It never did. The privacy most of us enjoy is actually anonymity. Celebrities struggle greatly to keep any portion of their lives out of public view; when you discard anonymity, privacy tends to go with it. As communications technology improves, the bar to achieve a degree of celebrity is lowered. I suspect the further back in history you go the difference will be the geographical radius of ones renown, not its impact.

We're rushing into a world where a huge percentage of the population will experience the advantages and disadvantages of losing anonymity in their daily lives.

The eCommerce site will know your approximate net worth.
The customer service response will be finely tuned to the likelihood your displeasure could damage their business.
Product companies will assemble marketing lists of people who are statistically more likely to buy their product. Not by placing ads in venues they are likely to frequent, but by targeting them directly.
When I look for a dance class for my daughter, I'll know if her friends are already enrolled somewhere without having to ask them.
Insurance as we know it today will fade away, uncompetitive. It will not use actuary tables, it will be essentially an auction based on a tailored risk profile.

We might recoil from this, but I suspect it is not something which can be stopped. The technology has reached the point where these things are feasible, and there is a huge economic incentive to do so. A concerted effort to stop it results in the technology being less visible, not absent.

Update: Louis Gray, the friend whose hospital checkin triggered this musing, has posted some thoughts on location-based services and what information we make available to others.

Monday, May 3, 2010

Safe Food Design: Hot Dogs

The American Academy of Pediatrics released a policy statement calling for the redesign of foods which pose a choking hazard to children. Among the foods listed are hot dogs.

I humbly present my proposal:

Hot Dog cut to resemble octopus tentacles

Thursday, April 29, 2010

Deep Pockets

In the tech industry, how often do we hear this?

"Great technology, they just didn't market it well."
"Its a shame to see that product die.
"They need somebody with deep pockets to see it through."

Of course this brings us to Palm, acquired by HP for $1.2 billion. Brian Humphries, an HP executive in business development, reportedly said: "Our intent is to double down on webOS." Palm managed to find their deep pocketed benefactor. Now we get to watch what happens.

This is the second time a savior has swooped in for Palm. In the early 1990s before the PDA had really established itself as a category, Palm nearly ran out of money. Its VCs were unwilling to put in more, but Palm was not generating enough revenue to operate. The company was purchased by US Robotics, which was later purchased by 3Com. Palm operated successfully for many years after that first brush with death.

We'll see what happens from here. HP likely believes that by owning the complete system, from hardware to OS to applications, they will be able to deliver compelling products and compete successfully with the iPhone. Time will tell.

Thursday, April 22, 2010

HTML5 is Hard, Lets Go Shopping!

I just wanted to embed two short audio clips in a web page. Just two little "play" buttons. Thats all. I started with a Flash player, "borrowing" one used by Google Reader:

<embed

  type="application/x-shockwave-flash"

  src="audio-player.swf?audioUrl=myfile.mp3">

</embed>

This worked fine, but its a brave new world. I decided to use HTML5's <audio> tag, falling back to the Flash player if <audio> is not supported. This results in:

<audio src="myfile.mp3" controls autobuffer> 

  <embed type="application/x-shockwave-flash"

    src="audio-player.swf?audioUrl=myfile.mp3">

  </embed>

</audio>

Loaded it into Chrome, it looks nice and plays fine. Life is good. I feel like a real web-enabled kindof guy. Before posting I try it in Firefox... whoops, it doesn't play. Firefox 3.6 doesn't handle MP3 files, most likely due to patent issues. So Firefox has an empty gray box with a little "X" through it.

In fact there is no single audio format supported by all common browsers. Supplying both MP3 and Ogg Vorbis is recommended for maximum compatibility. Next step: re-encode the audio and supply multiple formats. Ogg has to be first, because apparently if Firefox cannot play the first format it does not try subsequent ones.

<audio controls autobuffer>

  <source src="myfile.ogg"/>

  <source src="myfile.mp3"/>

  <embed type="application/x-shockwave-flash"

    src="audio-player.swf?audioUrl=myfile.mp3">

  </embed>

</audio>

This sortof works. Not really, but sortof. Chrome doesn't seem to like the Ogg file and plays static for the last half second instead. It probably doesn't play in Opera, which considers the src attribute of the audio tag to be mandatory. I have no idea what IE will do. At least Firefox is happy.

To get an audio tag which will work in all browsers, it appears I have to use JavaScript. Detect the capabilities of the browser, assemble an audio object in the DOM which meets their various requirements and bogosities, and hope for the best.

It shouldn't be this hard. Really, it shouldn't. It appears that as with nearly everything else in the modern web, the HTML5 media tags will be buried behind APIs in our Javascript frameworks to work around browser differences.