Thoughts



Thoughts

Given a set of m keys, a minimal perfect hash function maps each key to an integer 0 to m-1, and (most importantly) each key maps to a different integer. This means you can use the "perfect hash" number as a index into an array (i.e. use it as a hashmap) for guaranteed O(1) insertions & lookups. I'm going explain the BMZ algorithm, roughly following the author's C implmentation as it creates perfect hashes in O(m) space and time. I'll end up with an implementation of Google Guava's Equivalence as then you can use wrappers and standard Java HashMaps to create an efficient Collection with a minimum of wheel-reinventing.

But first I'll start with a simple example. Working in Java is useful as we can re-use our key Objects' hashCode methods to do most of the work. However, it's unlikely that the numbers that hashCode returns are "perfect" - so we'll have to modify them deterministically. I'll use an idea I got from the Jenkins hash algorithm - basically choose a seed integer and mix that with the hashCodes of the keys. As we want the resulting hashCode to lie between 0 and m-1 we'll just do mod-m on the result after mixing in the seed - so then now we just have to worry about choosing a seed that makes each object map to a different number.


Read full article from Thoughts


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts