Thursday, April 15, 2010

Code Snippet: hash_map

hash_map is not part of the current C++ STL, but is universally adopted as an extension. Using hash_map is simple when using built-in primitives like int or char*. Attempting to use user-defined classes as a key is somewhat more difficult, as the following example demonstrates:

class example1 {
 public:
  example1() {};
  ~example1() {};

  uint8_t name_[8];
};

typedef __gnu_cxx::hash_map<example1,int> Example1HashType;
Example1HashType hash_map1;

int main(int argc, char **argv) {
  example1 e1;
  hash_map1.insert(std::make_pair(e1, 1));
}

When compiled with gcc 4.2 this results in:

/usr/include/c++/4.2/bits/stl_function.h:200: error: no match for ‘operator==’ in ‘__x == __y’

/usr/include/c++/4.2/ext/hashtable.h:595: error: no match for call to ‘(const __gnu_cxx::hash<example1>) (const example1&)’

hash_map requires two things: a hash function, and an ability to compare equality. To use a class as a key you need to provide a hash<> template specialized for the class in question. You also need to implement an equality operator.

class example1 {
 public:
  example1() {};
  ~example1() {};

  bool operator==(const example1 &other) const {
    return (memcmp(name_, other.name_, sizeof(name_)) == 0);
  };

  uint8_t name_[8];
};

namespace __gnu_cxx {
template<> struct hash<example1> {
  size_t operator()(const example1& k) const {
    size_t hashval = 0;
    for (int i = 0; i < sizeof(k.name_); ++i) {
      hashval = 5 * hashval + k.name_[i];
    }
    return hashval;
  }
};
}  // namespace __gnu_cxx

typedef __gnu_cxx::hash_map<example1,int> Example1HashType;

int main(int argc, char **argv) {
  example1 e1;
  Example1HashType hash_map1;

  hash_map1.insert(std::make_pair(e1, 1));
}

This works, but what if we cannot add an operator method? For example, perhaps the class is in a library which we cannot modify, or is created by a code generator. Consider this case where key is a simple struct with no member functions. This code fails to compile, due to lack of "__x == __y"

struct example2 {
  uint8_t name_[8];
};

namespace __gnu_cxx {
template<> struct hash<example2> {
  size_t operator()(const example2& k) const {
    size_t hashval = 0;
    for (int i = 0; i < sizeof(k.name_); ++i) {
      hashval = 5 * hashval + k.name_[i];
    }
    return hashval;
  }
};
}  // namespace __gnu_cxx

typedef __gnu_cxx::hash_map<example2,int> Example2HashType;
Example2HashType hash_map2;

int main(int argc, char **argv) {
  example2 e2;
  hash_map2.insert(std::make_pair(e2, 1));
}

As this is C++, the solution will of course involve more templates. hash_map does not directly invoke "x == y," it uses an equal_to<> template. The "__x == __y" compiler error is from the template. We can provide an equal_to<> specialization instead.

struct example2 {
  uint8_t name_[8];
};

namespace std {
template<> struct equal_to<example2> {
  bool operator()(const example2& x, const example2& y) const {
    return (memcmp(x.name_, y.name_, sizeof(x.name_)) == 0);
  }
};
}  // namespace std

namespace __gnu_cxx {
template<> struct hash<example2> {
  size_t operator()(const example2& k) const {
    size_t hashval = 0;
    for (int i = 0; i < sizeof(k.name_); ++i) {
      hashval = 5 * hashval + k.name_[i];
    }
    return hashval;
  }
};
}  // namespace __gnu_cxx

typedef __gnu_cxx::hash_map<example2,int> Example2HashType;
Example2HashType hash_map2;

int main(int argc, char **argv) {
  example2 e2;
  hash_map2.insert(std::make_pair(e2, 1));
}

Extending hash_map to custom key types makes it useful in far more situations.