MurmurHash3 java 32 bit, 128 bit hash
I needed a really good hash function for the distributed indexing in SolrCloud. Since it is be used for partitioning documents, it needed to be really high quality (well distributed) since we don't want uneven shards. It also needed to be cross-platform, so a client could calculate this hash value themselves if desired, to calculate which partition a given document belongs on.
MurmurHash3
MurmurHash3 is one of the top favorite new hash function these days, being both really fast and of high quality. Unfortunately it's written in C++, and a quick google did not yield any suitable high quality port (this was back in 2011). So I took 15 minutes (it's small!) to port the 32 bit version, since it should be faster than the other versions for small keys like document ids. It works in 32 bit chunks and produces a 32 bit hash – more than enough for partitioning documents by hash code.
Read full article from MurmurHash3 java 32 bit, 128 bit hash
No comments:
Post a Comment