Paraphrasing Knuth, one should not choose a random hash function to generate a good hash table. As with any hashing task, there are the three classical issues to consider:
- The size of the hash in terms of the number of bits of output needed to hit your collision (two distinct keys hashing to the same value) goals and remain within your storage constraints
- The distributions of hashes on your input data, and the related problem of collisions
- Computation time
Over the next several posts, I will be putting a number of hash functions through the wringer in an effort to identify a handful that perform well on our data.
Read full article from Choosing a Good Hash Function, Part 1 – Research
No comments:
Post a Comment