Sunday, April 1, 2012

BangIP Option

We're almost out of IPv4 addresses, yet IPv6 deployment is still very, very slow. This is a recipe for disaster. I'm talking End of Internet predicted, film at 11 scale disaster. Something must be done. Steps must be taken.

I humbly suggest that the solution is really quite simple: re-use IPv4 addresses. I don't mean reclaiming IP addresses which have been allocated but are not actually being used. That is the kind of solution which sounds good and righteous to say but is completely unworkable in practice. Instead, I mean to simply issue the same IP address to multiple people.

"How could that possibly work?" people might ask. Go ahead, ask it... I'm glad that you asked, because I have a ready-made explanation waiting. We will leverage a solution to a similar problem which has scaled tremendously well over the last several decades: email addresses.

On early email systems like AOL, addresses had to be unique. This led to such ridiculous conventions as appending a number to the end of a popular address, like CompuGuy112673 (because really, everybody wants to have an email address like CompuGuy). The beauty of Internet email addressing is in breaking them up into a federated system, so and can simultaneously exist.

Therefore I propose to use this same solution for IP addressing. We will allow each Autonomous System to maintain its own IP address space. The same IP address can simultaneously exist within multiple ASNs. We are effectively adding a level of indirection to the destination in the IP header. Disambiguating the actual destination will be done via a new IP option.

IP Option field appended to frameBinary encoding of protocol headers is now passé. Therefore our new IP option is ASCII encoded, which also allows us to avoid the discussion about how Little Endian CPUs have won and all headers should be LE instead of BE. The content of the option is the destination IP address, followed by the '@' sign, followed by the decimal Autonomous System Number where the recipient can be reached. For example, a destination of within the Level 3 network space would be "" using Level3's primary ASN of 3356.

By my reading of the IP spec the option field can be up to 40 bytes, while an IP@ASN encoding could require up to 15 bytes for the IP address, 1 byte for the @ symbol, and could handle billions of IP namespaces using only 13 more bytes to encode a 32 bit ASN. So, we should be good to go.

Updates to the DNS A record and all the application code which looks up IP addresses is left as an exercise for the reader.

For historical reasons, this new option is referred to as the BangIP option. An earlier version of this system used bang-separated IP addresses as a source routing path. That portion of the proposal has been deprecated, but we retain the name for its sentimental value.

Monday, March 26, 2012

On Octal

An article in the March 8 issue of the journal PLoS Computational Biology (as reported by Science Daily) states:

Upon pre-synaptic excitation, calcium ions entering post-synaptic neurons cause the snowflake-shaped CaMKII to transform, extending sets of 6 leg-like kinase domains above and below a central domain, the activated CaMKII resembling a double-sided insect. Each kinase domain can phosphorylate a substrate, and thus encode one bit of synaptic information. Ordered arrays of bits are termed bytes, and 6 kinase domains on one side of each CaMKII can thus phosphorylate and encode calcium-mediated synaptic inputs as 6-bit bytes.

From this we can derive one inescapable conclusion: DEC was right about octal all along.

(Thanks to Sean Hafeez for posting a link to the Science Daily article on Google+)

Thursday, February 9, 2012

ATA Commands in Python

In software development sometimes you spend time on an implementation which you are unreasonably proud of, but ultimately decide not to use in the product. This is one such story.

I needed to retrieve information from an attached disk, such as its model and serial number. There are commands which can do this, like hdparm, sdparm, and smartctl, but initially I tried to avoid building in a dependency on any such tools by interrogating it from the hard drive directly. In pure Python.

Snicker all you want, it did work. My first implementation used an older Linux API to retrieve this information, the HDIO_GET_IDENTITY ioctl. This ioctl maps more or less directly to an ATA IDENTIFY or SCSI INQUIRY from the drive. The implementation uses the Python struct module to define the data structure sent along with the ioctl.

def GetDriveId(dev):
  """Return information from interrogating the drive.

  This routine issues a HDIO_GET_IDENTITY ioctl to a block device,
  which only root can do.

    dev: name of the device, such as 'sda' or '/dev/sda'

    (serial_number, fw_version, model) as strings
  # from /usr/include/linux/hdreg.h, struct hd_driveid
  # 10H = misc stuff, mostly deprecated
  # 20s = serial_no
  # 3H  = misc stuff
  # 8s  = fw_rev
  # 40s = model
  # ... plus a bunch more stuff we don't care about.
  struct_hd_driveid = '@ 10H 20s 3H 8s 40s'
  if dev[0] != '/':
    dev = '/dev/' + dev
  with open(dev, 'r') as fd:
    buf = fcntl.ioctl(fd, HDIO_GET_IDENTITY, ' ' * 512)
    fields = struct.unpack_from(struct_hd_driveid, buf)
    serial_no = fields[10].strip()
    fw_rev = fields[14].strip()
    model = fields[15].strip()
    return (serial_no, fw_rev, model)

No no wait, stop snickering, it does work! It has to run as root, which is one reason why I eventually abandoned this approach.

$ sudo python
('5RY0N6BD', '3.ADA', 'ST3250310AS')

HDIO_GET_IDENTITY is deprecated in Linux 2.6, and logs a message saying that sg_io should be used instead. sg_io is an API to send SCSI commands to a device. sg_io also didn't require my entire Python process to run as root, I'd "only" have to give it CAP_SYS_RAWIO. So of course I changed the implementation... still in Python. Stop snickering.

class AtaCmd(ctypes.Structure):
  """ATA Command Pass-Through"""

  _fields_ = [
      ('opcode', ctypes.c_ubyte),
      ('protocol', ctypes.c_ubyte),
      ('flags', ctypes.c_ubyte),
      ('features', ctypes.c_ubyte),
      ('sector_count', ctypes.c_ubyte),
      ('lba_low', ctypes.c_ubyte),
      ('lba_mid', ctypes.c_ubyte),
      ('lba_high', ctypes.c_ubyte),
      ('device', ctypes.c_ubyte),
      ('command', ctypes.c_ubyte),
      ('reserved', ctypes.c_ubyte),
      ('control', ctypes.c_ubyte) ]

class SgioHdr(ctypes.Structure):
  """<scsi/sg.h> sg_io_hdr_t."""

  _fields_ = [
      ('interface_id', ctypes.c_int),
      ('dxfer_direction', ctypes.c_int),
      ('cmd_len', ctypes.c_ubyte),
      ('mx_sb_len', ctypes.c_ubyte),
      ('iovec_count', ctypes.c_ushort),
      ('dxfer_len', ctypes.c_uint),
      ('dxferp', ctypes.c_void_p),
      ('cmdp', ctypes.c_void_p),
      ('sbp', ctypes.c_void_p),
      ('timeout', ctypes.c_uint),
      ('flags', ctypes.c_uint),
      ('pack_id', ctypes.c_int),
      ('usr_ptr', ctypes.c_void_p),
      ('status', ctypes.c_ubyte),
      ('masked_status', ctypes.c_ubyte),
      ('msg_status', ctypes.c_ubyte),
      ('sb_len_wr', ctypes.c_ubyte),
      ('host_status', ctypes.c_ushort),
      ('driver_status', ctypes.c_ushort),
      ('resid', ctypes.c_int),
      ('duration', ctypes.c_uint),
      ('info', ctypes.c_uint)]

def SwapString(str):
  """Swap 16 bit words within a string.

  String data from an ATA IDENTIFY appears byteswapped, even on little-endian
  achitectures. I don't know why. Other disk utilities I've looked at also
  byte-swap strings, and contain comments that this needs to be done on all
  platforms not just big-endian ones. So... yeah.
  s = []
  for x in range(0, len(str) - 1, 2):
  return ''.join(s).strip()

def GetDriveIdSgIo(dev):
  """Return information from interrogating the drive.

  This routine issues a SG_IO ioctl to a block device, which
  requires either root privileges or the CAP_SYS_RAWIO capability.

    dev: name of the device, such as 'sda' or '/dev/sda'

    (serial_number, fw_version, model) as strings

  if dev[0] != '/':
    dev = '/dev/' + dev
  ata_cmd = AtaCmd(opcode=0xa1,  # ATA PASS-THROUGH (12)
                   protocol=4<<1,  # PIO Data-In
                   # flags field
                   # OFF_LINE = 0 (0 seconds offline)
                   # CK_COND = 1 (copy sense data in response)
                   # T_DIR = 1 (transfer from the ATA device)
                   # BYT_BLOK = 1 (length is in blocks, not bytes)
                   # T_LENGTH = 2 (transfer length in the SECTOR_COUNT field)
                   features=0, sector_count=0,
                   lba_low=0, lba_mid=0, lba_high=0,
                   command=0xec,  # IDENTIFY
                   reserved=0, control=0)
  ASCII_S = 83
  sense = ctypes.c_buffer(64)
  identify = ctypes.c_buffer(512)
  sgio = SgioHdr(interface_id=ASCII_S, dxfer_direction=SG_DXFER_FROM_DEV,
                 mx_sb_len=ctypes.sizeof(sense), iovec_count=0,
                 dxferp=ctypes.cast(identify, ctypes.c_void_p),
                 sbp=ctypes.cast(sense, ctypes.c_void_p), timeout=3000,
                 flags=0, pack_id=0, usr_ptr=None, status=0, masked_status=0,
                 msg_status=0, sb_len_wr=0, host_status=0, driver_status=0,
                 resid=0, duration=0, info=0)
  SG_IO = 0x2285  # <scsi/sg.h>
  with open(dev, 'r') as fd:
    if fcntl.ioctl(fd, SG_IO, ctypes.addressof(sgio)) != 0:
      print "fcntl failed"
      return None
    if ord(sense[0]) != 0x72 or ord(sense[8]) != 0x9 or ord(sense[9]) != 0xc:
      return None
    # IDENTIFY format as defined on pg 91 of
    serial_no = SwapString(identify[20:40])
    fw_rev = SwapString(identify[46:53])
    model = SwapString(identify[54:93])
    return (serial_no, fw_rev, model)

For the unbelievers out there, this one works too.

$ sudo python
('5RY0N6BD', '3.ADA', 'ST3250310AS')

So, there you go. HDIO_GET_IDENTITY and SG_IO implemented in pure Python. They do work, but in the process of working on this and reading existing code it became clear that low level ATA handling is fraught with peril. The disk industry has been iterating this interface for decades, and there is a ton of gear out there that made questionable choices in how to interpret the spec. Most of the code in existing utilities is not to implement the base operations but instead to handle the quirks from various manufacturers. I decided that I didn't want to go down that path, and will instead rely on forking sdparm and smartctl as needed.

I'll just leave this post here for search engines to find. I'm sure there is a ton of demand for this information.

Stop snickering.

Thursday, February 2, 2012


In addition to its role in assigning IPv4 addresses, DHCP has an options mechanism to send other bits of data between client and server. For example there are options to provide the DNS and NTP server addresses along with the client's assigned IP address. In its request, the client lists the options it would like to receive. The server fills in whichever ones it can.

DHCP has always allowed for vendor extensions of the available options, inheriting this support from the earlier BOOTP protocol. The original vendor extension mechanism was very simple: use option 43, and put whatever you like there. A mechanism was provided to encode multiple sub-options within the option 43 data, but made no attempt to coordinate between different vendors use of the space. Vendors immediately began creating conflicts, using the same numeric codes for different means simultaneously. This led to a variety of heroic hacks in which the client would populate its request with magic values which the server would use to figure out the set of vendor options to supply.

DHCP6 defined a more complex encoding, where each vendor includes their unique IANA Enterprise Number as part of its option. Options from different vendors can be accommodated simultaneously. This Vendor-Identifying Vendor Options (VIVO) encoding was also added back to DHCP4 as options 124 and 125. DHCP4 thus has two separate vendor option mechanisms in common use.


The ISC DHCP server can support VIVO options using several different mechanisms. The first, and so far as I can tell most common, is to specify the byte-level payload. The administrator pores over RFCs and vendor documentation to come up with the magic string of bytes to send and types it into dhcpd.conf, where it immediately becomes magic voodoo that everyone is afraid to touch for fear of breaking something.

Avoiding magic byte strings by specifying the format of the options is more difficult to get working, but easier to maintain and understand. We'll consider an example here.

  • Vendor: Frobozzco
  • IANA Enterprise Number (IEN): 12345
  • Code #1: a text string containing the location within the maze.
  • Code #2: an integer describing the percentage likelihood of being eaten by a grue.
    In practice this is always 100%, which many clients simply hard-code.

To implement this in the DHCP config, we define an option space for frobozzco. This just creates a namespace; we bind that namespace to the numeric IEN later. We have to tell DHCP how wide to make the code and length fields. DHCP4 usually uses 1 byte fields, while DHCP6 generally uses 2 bytes. Most of the time vendors don't specify the width they use, and if so you should assume the normal sizes for the protocol. The example below comes from a DHCP6 config, so the code and length are both declared as two bytes. After we've declared all of the option codes, we bind the frobozzco option space to its numeric IANA Enterprise Number. Use of IEN for DHCP is called the Vendor Specific Information Option, so the syntax in the DHCP configuration labels this vsio.{option space name}

option space frobozzco code width 2 length width 2;
option frobozzco.maze-location code 1 = text;
option frobozzco.grue-probability code 2 = uint8;
option vsio.frobozzco code 12345 = encapsulate frobozzco;

Owing to the long and sordid history of numbering conflicts, most vendor extensions define a secret handshake. The client inserts a specific value into a field in its request to trigger the server to respond with the options for that vendor. Frobozzco has decreed that clients should send the string "look north" as a vendor-class option in the request. A DHCP6 vendor-class consists of the vendor's IEN followed by the content. In our case the content consists of another two byte length field, followed by the string. ISC DHCP 4.x doesn't define a type for handling vendor-class in the config but we can construct one using a record, which is a collection of fields defined inside brackets.

option dhcp6.vendor-class code 16 = {integer 32, integer 16, string};

# length=14 bytes, Frobozzco IEN, content=look north
send dhcp6.vendor-class 12345 14 "look north";

Finally, we have to provide a script for dhclient to run to handle the received options. We'll get to the client script a bit later, for now just assume it is in /usr/local/sbin/dhclient-script. Putting it all together, the dhclient6.conf should look like this.

script "/usr/local/sbin/dhclient-script";

option space frobozzco code width 2 length width 2;
option frobozzco.maze-location code 1 = text;
option frobozzco.grue-probability code 2 = uint8;
option vsio.frobozzco code 12345 = encapsulate frobozzco;

option dhcp6.vendor-class code 16 = {integer 32, integer 16, string};

interface "eth0" {
    also request dhcp6.vendor-opts;
    send dhcp6.vendor-class 12345 10 "look north";


On the client we also must provide the script for dhclient to run. The OS vendor will have provided one, often in /sbin or /usr/sbin. We'll copy it, and add handling.

dhclient passes in environment variables for each DHCP option. The name of the variable is "new_<option space name>_<option name>" For the example config above, we'd define a shell script function to write our two options to files in /tmp.

make_frobozzco_files() {                                                       
  mkdir /tmp/frobozzco                                            
  if [ "x${new_frobozzco_maze_location}" != x ] ; then         
    echo ${new_frobozzco_maze_location} > /tmp/frobozzco/maze_location                 
  if [ "x${new_frobozzco_grue_probability}" != x ] ; then
    echo ${new_frobozzco_grue_probability} > /tmp/frobozzco/grue_probability

The dhclient-script provided with the OS will have handling for DNS nameservers. Adding a call to make_frobozzco_files at the same points in the script which handle /etc/resolv.conf is a reasonable approach to take.

I'm mostly blogging this for my own future use, to be able to find how to do something I remember doing before. There you go, future me.

Tuesday, January 31, 2012


When building the software for a new device there are a ton of things which need to go into the filesystem: binaries, libraries, device nodes, a bunch of configuration files, etc. One of the chunks of essential data is a list of trusted certificate authorities for libssl, commonly stored in /etc/ssl/ca_certificates.crt. Its common practice to grab a list of certificates from, and there are various Perl and Python scripts floating around in the search engines to assemble this list into a PEM file suitable for libssl.

2011 was not a good year for certificate authorities. DigiNotar was seized by the Dutch government after it became clear they had been thoroughly breached and generated fraudulent certificates for many large domains. Several Comodo resellers were similarly compromised and generated bogus certs for some of the same sites. Browser makers responded by encoding a list of toxic certificates into the browser, to reject any certificate signed by them.

Encoding a list of toxic certificates is the key phrase in that paragraph. As of 2011, Mozilla's certdata.txt contains both trusted CAs and absolutely untrustworthy, revoked CAs. There is metadata in the entry describing how it should be treated, but several of the scripts floating around grab everything listed in certdata.txt and put it in the PEM file. This is disastrous.

The code to search for is CKT_NSS_NOT_TRUSTED. The utility you are using should check for that type, and skip it. If there is no handling for CKT_NSS_NOT_TRUSTED, then the utility you are using is absolutely broken. Don't use it. I know of at least two which handle this correctly: