Coding Relic: Hash Tables Versus CAMs

Nowadays when discussing switching and routing, we talk about CAMs (Content Addressable Memory). Yet CAMs came into widespread use in networking gear less than 10 years ago, and the Internet is older than that. Before CAMs, switch ASICs used hash-and-match engines to route packets.

This is the first in a series on articles focussing on hash-and-match for networking applications. Why write this? I believe we rely too much on CAMs in networking today, resulting in equipment which is more expensive than it needs to be. Hash engines can fill many roles with aplomb, yet we plop down a CAM and move on. Furthermore we're in danger of defining new protocols in such a way that effectively mandates a CAM, blocking the ability to use hash tables at all. That would be unfortunate.

This first post provides definitions for the discussion to follow.

Hash tables have been used more or less forever in software. To find data in the table a function is computed over the key. The hash function reduces the key down to a narrow index, which offsets into the table. There is a good description of hash tables at Wikipedia. Hash tables implemented in hardware are very similar to software, differing mainly in the optimizations applicable (which will be covered in a future article in this series).

Hash table performance is very sensitive to the quality of the hash function. Having multiple keys hash to the same index is called a collision. A bad hash function leads to many items colliding in the same small set of entries while leaving most of the table unoccupied.

MAC addresses hashing to various locations

Content Addressable Memory is a completely different type of memory. It differs from RAM in how data is fetched: for RAM you supply an address, for a CAM you supply a key to look for. The CAM returns any data previously associated with that key.

CAMs are often described as being the hardware equivalent of Associative Arrays from software, but I reject this comparison as more misleading than it is helpful. Associative arrays are a high level abstraction implemented on top of underlying datastructures; they are generally implemented using hash tables. Hardware CAMs are not implemented using hash tables. They are more akin to magic, where a key is presented and every entry in the CAM simultaneously compares itself to the input.

Ternary CAM adds the ability to selectively mask bits from the input key when searching, and are generally the type used in network devices. For example consider a routing table with increasingly specific routes for a destination, and how this will be handled in the CAM:

The CAM returns the first match it finds, so the most specific routes need to appear earlier. Consider an example where we look for 10.1.1.2. The CAM will apply the 0xFFFFFFFF mask of the first entry and find that it does not match 10.1.1.1. It will apply the second mask 0xFFFFFFF0 and find a match, so this result is returned. The rest of the entries actually are compared (everything happens in parallel inside the CAM) but their results are suppressed by prioritization logic.

The next article in this series will focus on hardware hash table implementation issues.

footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Ethernet label.

Monday, June 6, 2011

Hash Tables Versus CAMs