Wednesday, June 9, 2010

4k Sectors Approacheth

hard drive magnetic headIts amazing that hard drives work at all. A tiny little drive head flies just above a metallic tundra, manipulating miniscule dots of magnetism flying by at high speed. The dots have gotten small enough that advances in materials science are required to reliably detect the field.

As an industry, drive manufacturers have done a remarkable job in advancing the technology without breaking compatibility. For example when drives added LBA48 to support larger than 128 Gigabytes, the older LBA28 commands were retained without modification. New drives could be put into existing LBA28 controllers without trouble in the common cases. No more than 128 GB would be used, but older controllers did not stop working the instant LBA48 came out. It allowed an orderly transition to newer designs.

We're on the verge of the next big transition: 4k sectors. For 30 years hard disk drives have used a 512 byte sector. I'm not sure of the original motivation for that specific size, though I suspect the VAX page size of 512 bytes was a factor. The drive industry begin preparing for a transition to 4 Kilobyte pages nearly ten years ago, and the first products are now on the market.


Anatomy of a Disk Sector

Disks with 512b sectors currently allocate about 40 bytes of additional space for ECC. Thus the error correction occupies ~8% of the raw capacity of the disk. The density of bits on the platter continues to increase, while imperfections in the drive media tend to remain the same size. As more bits are packed into the same area a media flaw will affect a larger span, and require more ECC to recover. If drives stick with 512 sectors, one can see the day coming when ECC will consume unacceptable fractions of the disk: 20%, 30%, etc. Therefore the drive industry is moving to 4 kilobyte sectors, which amortize the ECC across larger swaths of data. Where a 512 byte sector uses 40 bytes of ECC, a 4096 byte sector requires about 100 bytes. Eight times more data is covered with only 2.5x more ECC.

There are several other sources of overhead for each sector, including a synchronization region at the beginning (to prepare the read head to deserialize the data) and a gap between sectors. I do not know the size of these, but they should remain the same even as they amortize over 8x more data. These are a smaller win, but worth mentioning.

As with previous technology transitions the drive will continue to accept the older commands for 512 byte sector accesses, transparently performing a read-modify-write to the enclosing 4096 byte sector. The first time such a sector is accessed will be relatively expensive: the drive head cannot read and write simultaneously, it must first read in the full 4096 bytes and then allow a complete rotation of the platter before it can write the modification back. All modern drives contain 32 or 64 MBytes of cache, subsequent sub-sector writes can merge from cache to write directly to the platter.

Most processor architectures and OS implementations use a page size of 4K or larger, and almost always write full pages to disk. No read+modify is needed if the entire 4K sector is being written. There is a caveat to this happy outcome: the OS page needs to start at the 4K sector boundary, which really means the disk partition needs to start at a 4K aligned boundary. If it doesn't, then even 4k writes will still turn into read-modify-write cycles.

According to Western Digital, Windows versions starting with Vista and all recent versions of MacOS X and Linux align their partitions to a multiple of 4K Bytes. Windows XP and earlier generally did not: the Master Boot Record ended at sector 63, and all subsequent partitions would be laid out one sector off from 4K alignment. If a subsequent partition was itself not a multiple of 4K, it would throw off the alignment of the partitions which follow it.

Embedded systems should also be on the list of potential problem areas for the 4k sector size: DVRs, security camera monitoring systems, various consumer electronics, etc. Its quite common to design a product, use existing tools like fdisk.exe to create a "golden software image," and bit-copy that image to the hard drive. If the fdisk of the day did not align its partitions, then the image won't have them aligned. Periodically as components are discontinued new substitutes have to be qualified. Qualification of new commodity components is often left to the contract manufacturer, the engineering team may not be involved at all. As hard drive models come and go relatively frequently, a design will see several different drive models through its production lifetime.

In this particular case, its worthwhile for the engineering team to be proactive and not leave it up to the CM. A 4K sector drive will work: the software will boot and operate. Only the performance is impacted. Its quite conceivable for the CM to finish a change order for a new drive and ship a significant amount of product before the performance issues are noticed, if the problem is subtle.

WD has two solutions if unaligned writes are a problem:

  1. A jumper on the drive can add one to all 512B sector numbers.
  2. WDAlign.exe can re-image an existing installation to align the partitions.

If your existing product happens to have its partitions all off by one sector, presumably because an older Windows fdisk.exe was used to create it, the jumper is a potential solution. There is no telling how long drive manufacturers will keep the jumper in their products, of course. If the existing golden image has misaligned partitions, its time to start working on a new image. This should be a matter of changing the partition table without having to touch the binaries. A QA cycle would be needed, checking for regression.

If the partitions are misaligned, the product accesses the raw disk devices, and it avoids using a partition table to "improve performance" or some other reason... you're screwed. Start a project to update the design, and don't hard-code sector numbers next time. Native 512b drives will be available for a while, which may provide enough time to re-engineer.


Random Postscript

This kind of change in block size has happened once before, by my recollection. Many very early CD-ROM drives used a 512 byte sector size, matching that of hard drives. Sometime in the early 1990s CD drives changed to a 2048 byte sector, which they still use today. A number of drives had jumpers to switch between the two sizes, and I recall Sun workstations of the time being unable to boot from a 2048 byte sector.