Monday, June 28, 2010

Toxoplasmowhosit?

ToxoplasmosisToxoplasmosis has long been known to modify the behavior of its host mice, influencing them in ways which make it more likely they will be eaten by a cat. The microbe reproduces in the intestines of felines. The chemicals it produces to effect this appear to also have an impact in other mammals, including humans. As cats do not normally eat humans, effects on humans are just an accident.

Natural selection will eventually produce a toxoplasmosis microbe which forces humans to take felines into their homes and care for them, producing a much larger population of suitable environments for their reproduction. When this happens, when large numbers of humans willingly share their homes with cats, then we will know that the microbes have taken over.

Thursday, June 24, 2010

Virtual Trouble

After many years of working in plain C, I'm back to writing C++. I feel like an unfrozen caveman, confused by the flashing lights of the big city. Here is something I ran into recently.

#include <stdio.h>

class BaseClass {
 public:
  BaseClass() { InitName(); }
  virtual void InitName() { name_ = "BaseClass"; }
  char *name_;
};

class SubClass : public BaseClass {
 public:
  virtual void InitName() { name_ = "SubClass"; }
};

int main(int argc, char** argv) {
  BaseClass base;
  SubClass sub;

  printf("BaseClass name_ = %s\n", base.name_);
  printf("SubClass  name_ = %s\n", sub.name_);
}

A base class provides a virtual InitName() method, and calls it from the constructor. A subclass overrides InitName(), yet the overridden method is not called during construction. The BaseClass InitName() is used instead.

$ ./a.out
BaseClass name_ = BaseClass
SubClass  name_ = BaseClass

Why?


A Maze of Twisty Little Passages

Objects are constructed from the most basic class first. When the BaseClass() constructor runs, the SubClass methods and member variables have not yet been initialized. The object is a BaseClass object at that point. When BaseClass::BaseClass() returns, the object will be remarked as a SubClass object, and only then will its overridden methods actually do anything. Destructors work similarly. The outermost derived class is destroyed first, and by the time BaseClass::~BaseClass() runs the object will be of BaseClass type. Any virtual methods called from ~BaseClass() will call the BaseClass definition.

Scott Meyers' Effective C++, 3rd Edition (*) devotes a chapter to this topic, with considerably more detail. That chapter happens to be available online in an excerpt by the publisher.

For my specific issue, my object already had an Init() method to be called after object construction. It was straightforward to move the functionality from the constructor to Init(), with some checks to make it do something sensible if the caller neglects to call Init().

(*) - affiliate link

Monday, June 21, 2010

Stackoverflow Maintenance Page

I love how the stackoverflow.com maintenance page show a binary dump of a Windows process where the stack overflow error is defined. Clever.

Stackoverflow is down for maintenance.

Thursday, June 17, 2010

A Moon of Endor

At the time of this writing 455 planets outside of Earth's own solar system have been discovered. Nearly all are gas giants like Jupiter and Saturn, and even the smallest are several times the mass of Earth. This doesn't mean smaller planets are uncommon, it means our current techniques using optical occlusion and gravitational deflection are far better at detecting massive planets.

When I read reports on these discoveries they have already been "dumbed down" for a mainstream audience. Invariably the lack of Earth-like planets is mentioned, followed by a reference to extraterrestrial life on those Earth-like planets. Yet I suspect that if we're really interested in finding planets where life is likely to have evolved, gas giants are what we should be looking for.

Mars as seen by the Hubble space telescope.In our own solar system there are three "Earth like" planets: Venus, Earth, and Mars. Of the three, only Earth is tectonically active with a strong magnetic field. Tectonics and vulcanism leads to temperature variation in the environment, which on Earth appears to spur evolution. A strong magnetic field protects the planet's surface from solar flares.

Something about Earth is different, resulting in it having a highly dynamic molten core where Mars and Venus are far more settled. One possibility is the Moon: Earth has a relatively enormous moon compared to Mars. The force of lunar gravity exerts considerable strain on the planet, and perhaps that keeps the inner dynamo from settling. Another possibility is related to the formation of the moon: if indeed it formed due to a massive asteroid strike on the Earth ejecting a huge volume of material into space, then perhaps the planet simply hasn't settled down yet.

The moons of a gas giant have some of the same properties as Earth. The massive gravity of their neighbor exerts considerable force, making a dynamic molten core more likely. If their orbit is close enough they also sit inside their host's magnetic field, protecting them from solar wind and flares. There is a considerable amount of radiation near a gas giant, but its a constant level which becomes part of the environment. On Earth life seems able to evolve in extremely harsh environments, so perhaps life can evolve on a gas giant moon in spite of the radiation. In our own solar system it is possible that life exists on Titan, which would be incredibly exciting.

Advancements in our ability to detect Earth-like exoplanets is interesting, but to me it will be far more interesting when we can detect moons orbiting gas giants.

Monday, June 14, 2010

Expect the Unexpected Error

Sonos.com Unexpected error occurred.

Well, sure. If were it an expected error you would have done something more useful.


Wednesday, June 9, 2010

4k Sectors Approacheth

hard drive magnetic headIts amazing that hard drives work at all. A tiny little drive head flies just above a metallic tundra, manipulating miniscule dots of magnetism flying by at high speed. The dots have gotten small enough that advances in materials science are required to reliably detect the field.

As an industry, drive manufacturers have done a remarkable job in advancing the technology without breaking compatibility. For example when drives added LBA48 to support larger than 128 Gigabytes, the older LBA28 commands were retained without modification. New drives could be put into existing LBA28 controllers without trouble in the common cases. No more than 128 GB would be used, but older controllers did not stop working the instant LBA48 came out. It allowed an orderly transition to newer designs.

We're on the verge of the next big transition: 4k sectors. For 30 years hard disk drives have used a 512 byte sector. I'm not sure of the original motivation for that specific size, though I suspect the VAX page size of 512 bytes was a factor. The drive industry begin preparing for a transition to 4 Kilobyte pages nearly ten years ago, and the first products are now on the market.


Anatomy of a Disk Sector

Disks with 512b sectors currently allocate about 40 bytes of additional space for ECC. Thus the error correction occupies ~8% of the raw capacity of the disk. The density of bits on the platter continues to increase, while imperfections in the drive media tend to remain the same size. As more bits are packed into the same area a media flaw will affect a larger span, and require more ECC to recover. If drives stick with 512 sectors, one can see the day coming when ECC will consume unacceptable fractions of the disk: 20%, 30%, etc. Therefore the drive industry is moving to 4 kilobyte sectors, which amortize the ECC across larger swaths of data. Where a 512 byte sector uses 40 bytes of ECC, a 4096 byte sector requires about 100 bytes. Eight times more data is covered with only 2.5x more ECC.

There are several other sources of overhead for each sector, including a synchronization region at the beginning (to prepare the read head to deserialize the data) and a gap between sectors. I do not know the size of these, but they should remain the same even as they amortize over 8x more data. These are a smaller win, but worth mentioning.

As with previous technology transitions the drive will continue to accept the older commands for 512 byte sector accesses, transparently performing a read-modify-write to the enclosing 4096 byte sector. The first time such a sector is accessed will be relatively expensive: the drive head cannot read and write simultaneously, it must first read in the full 4096 bytes and then allow a complete rotation of the platter before it can write the modification back. All modern drives contain 32 or 64 MBytes of cache, subsequent sub-sector writes can merge from cache to write directly to the platter.

Most processor architectures and OS implementations use a page size of 4K or larger, and almost always write full pages to disk. No read+modify is needed if the entire 4K sector is being written. There is a caveat to this happy outcome: the OS page needs to start at the 4K sector boundary, which really means the disk partition needs to start at a 4K aligned boundary. If it doesn't, then even 4k writes will still turn into read-modify-write cycles.

According to Western Digital, Windows versions starting with Vista and all recent versions of MacOS X and Linux align their partitions to a multiple of 4K Bytes. Windows XP and earlier generally did not: the Master Boot Record ended at sector 63, and all subsequent partitions would be laid out one sector off from 4K alignment. If a subsequent partition was itself not a multiple of 4K, it would throw off the alignment of the partitions which follow it.

Embedded systems should also be on the list of potential problem areas for the 4k sector size: DVRs, security camera monitoring systems, various consumer electronics, etc. Its quite common to design a product, use existing tools like fdisk.exe to create a "golden software image," and bit-copy that image to the hard drive. If the fdisk of the day did not align its partitions, then the image won't have them aligned. Periodically as components are discontinued new substitutes have to be qualified. Qualification of new commodity components is often left to the contract manufacturer, the engineering team may not be involved at all. As hard drive models come and go relatively frequently, a design will see several different drive models through its production lifetime.

In this particular case, its worthwhile for the engineering team to be proactive and not leave it up to the CM. A 4K sector drive will work: the software will boot and operate. Only the performance is impacted. Its quite conceivable for the CM to finish a change order for a new drive and ship a significant amount of product before the performance issues are noticed, if the problem is subtle.

WD has two solutions if unaligned writes are a problem:

  1. A jumper on the drive can add one to all 512B sector numbers.
  2. WDAlign.exe can re-image an existing installation to align the partitions.

If your existing product happens to have its partitions all off by one sector, presumably because an older Windows fdisk.exe was used to create it, the jumper is a potential solution. There is no telling how long drive manufacturers will keep the jumper in their products, of course. If the existing golden image has misaligned partitions, its time to start working on a new image. This should be a matter of changing the partition table without having to touch the binaries. A QA cycle would be needed, checking for regression.

If the partitions are misaligned, the product accesses the raw disk devices, and it avoids using a partition table to "improve performance" or some other reason... you're screwed. Start a project to update the design, and don't hard-code sector numbers next time. Native 512b drives will be available for a while, which may provide enough time to re-engineer.


Random Postscript

This kind of change in block size has happened once before, by my recollection. Many very early CD-ROM drives used a 512 byte sector size, matching that of hard drives. Sometime in the early 1990s CD drives changed to a 2048 byte sector, which they still use today. A number of drives had jumpers to switch between the two sizes, and I recall Sun workstations of the time being unable to boot from a 2048 byte sector.

Monday, June 7, 2010

We Still Have Unlimited SMS

AT&T recently announced changes to their data plans, eliminating the unlimited data plans and replacing them with tiered plans offering up to 2GB of data per month. Yet AT&T still offers unlimited messaging, including SMS and MMS. Thus, there is a mechanism to game the system.


IP via SMS

SMS relies on the GSM 03.38 character set, which is 7 bits but using an escape code for additional characters. UTF-16 is used for non-Latin alphabets, and an 8 bit clean data alphabet is also supported (but not required).

For widest support we should rely on GSM 03.38, the only character encoding which handsets are required to implement. TCP/IP packets can be encoded into at most 160 characters of 7 bits each, for a total of 1120 bits or 140 bytes. An <ESC> can be followed by only a few valid characters, so an <ESC> followed by any arbitrary character might be rejected at some point through the delivery path. An HDLC character-stuffing technique can be used to eliminate <ESC> from the data stream, details to be worked out later.

SMS does define a concatenated option to support longer messages, but it occupies 7 characters in the payload. As a compressed IP header would be smaller than 7 characters, it is recommended that SMS concatenation not be used. A 140 byte packet MTU will pack more efficiently than a larger MTU plus concatenation overheads.

With such a small MTU, header compression is a must. Robust Header Compression defines profiles for TCP+IP, UDP+IP, and RTP+UDP+IP compression, which can reduce the typical header stack down to about 3 bytes. For deployment use we'll need to define a new ROHC profile for IP Fragments, an RFC will be drafted later.


IP via MMS QR code

MMS can be used to send small pictures and movies. Initially this seemed very promising: wrap the IP packet inside a JPG header and send it. Unfortunately to save cost, most mobile operators re-sample and re-encode any images sent via MMS to reduce their size. Straight data inside an image payload would be destroyed.

Instead the IP packet should be encoded as an image of a QR code, which can be robustly decoded even if the image is resampled. QR Code also includes error correction, helpful in this application.

QR Code can handle over 2 KBytes of data in a single barcode, easily enough to handle the maximum 1.5K ethernet frame size. ROHC header compression is not required.


Closing Notes

Yes, this is a joke. Material posted on Mondays is not intended to be taken seriously. After all, nothing productive happens on Monday.