Wednesday, June 8, 2011

Ternary Hashing

This is the third in a series of articles discussing hash engines for network switch chips. It is best to start with the first and second installments.


CAMs for networking applications support ternary masking, the ability to mark bits of the key as don't care. This is useful for a number of applications. Consider ACLs: the switch can drop packets which match a particular IP address, L4 port, ingress port, etc. Most of the possible match fields will be marked as don't care, and only those fields for the specific ACL will be populated.

Hash engines can support masking, with some limitations. Once computed, you don't mask off bits of the hash value. Instead you mask bits of the original key and compute a new hash. The possible mask values must be known in advance, and each mask requires its own lookup cycle in the table. If a mask excludes the majority of bits in the key it can result in collision problems (too little variance in the keys), but masking off so many bits usually means there are few possible matches anyway.

Lookup #Mask Notes
1. 0xffffffff0000L3 ACL
2. 0x00000000ffffL4 ACL
3. 0xffffffffffffL3+L4 ACL

This isn't the way CAMs work. The unmodified key is presented to the CAM, containing all fields, and each entry in the CAM performs its own masking of what bits to ignore. However even CAM vendors are rethinking the utility of having a unique mask for every entry in the CAM. That scheme consumes half of the bits in the CAM to store masks, and in practice this tremendous flexibility is mostly unused. Generally all CAM entries use one of a relatively small set of masks for various prefix lengths and ACL configurations. Some recent CAM designs instead supply a generous pool of global mask registers, and have each entry to reference one of the global masks. This leaves more of the CAM bits available to store keys.


 
Longest Prefix Match

Routers frequently have multiple, overlapping routes for a destination. The router is supposed to choose the most specific route, meaning the one with the longest prefix. For example if 10.0.0.0/26 and 10.0.0.0/8 both match, it should choose the /26. CAMs handle this using ternary masking to ignore the most specific bits of the destination IP. The most specific routes are placed earlier in the CAM, which takes the first match.

It is possible to handle IP prefix matching in a hash engine by spending multiple lookup cycles. Each lookup masks off one more bit of the IP address before computing the hash function, and thus searches for progressively less specific routes until it finds one.

lookupmask
1. 255.255.255.255
2. 255.255.255.254
3. 255.255.255.252
4. 255.255.255.248
5. 255.255.255.240
6. 255.255.255.224
... etc, etc ...
31. 128.0.0.0

Unfortunately the switch is unlikely to have a cycle budget sufficient for so many lookups. Fortunately, there is a further optimization.


 
LPM: Make the Target Bigger

/22 route expands to four /24 routesWe can reduce the number of lookup cycles by trading off an increased number of entries in the table. The hardware will be designed to only search for a limited number of prefix lengths (the specific masks would be programmable). Software expands all entries of prefix lengths between those the hardware will lookup. In the sample diagram the hardware can search with a /24 mask, and it will find the /22 route because it has been expanded into four /24 entries. Any incoming IP address in the range which should match the /22 will find one of the four entries in the table.

/23 route overwrites two of the entries added by the /22This technique also has to deal with overlapping routes. When a more specific /23 route is added, it should overwrite two of the entries from the less specific /22 route. Addition and deletion of entries in the hash table is complex in this scheme, particularly because it has to be done while packets continue to flow through the device. Nonetheless its always preferable to tolerate complexity in software than to make the hardware handle it. A bug in production software can be fixed. A bug in production hardware turns into a software bug to work around it.

The prefix expansion doesn't need to be linear. It doesn't have to be every 2 bits, as implied above. Analyzing the distribution of prefix lengths in routing tables shows they tend to clump at various lengths, not be uniformly distributed. Hardware used in a campus switch would be set for the prefixes common in that environment. It might apply masks for prefix length 32 (for a local station), 24, 19, 12, and 8. The hardware only needs to budget for 5 lookups, and software expands any other prefixes to match.


 
Why CAMs for Prefixes?

With the ability to do longest prefix matching in an inexpensive hash engine, why use CAMs at all? There are two main downsides to using hashing for this:

  1. A CAM with N entries will hold N routes. Due to prefix expansion, its really hard to say how many routes a hash table with N entries will hold.
  2. IPv6. Only the upper 64 bits of an IPv6 addresses need to deal with prefix lengths, but thats a bunch more lookup cycles.

In the end, a switch which deals with a lot of prefix routing will use a CAM. CAMs are simply better at it. That same switch might still use hash engines for L2 lookups, for ACLs, and anything 5-tuple based. Edge switches which primarily deal with directly attached stations and only a few prefix routes can get by using their hash engine for everything.


The next article in this series will look to the future direction of networking.


footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Ethernet label.