Hash tables with O(1) worst-case lookup and space efficiency [pdf] | Hacker News



Hash tables with O(1) worst-case lookup and space efficiency [pdf] | Hacker News

Mitzenmacher, the author of Probability and Computing, has an interesting survey on cookoo hashing. In it there are a number of open research problems that, for those of you with interest, a worth looking over. The 7th is very interesting to me, regarding optimal ways to maintain a hash table in a parallel computing environment.

http://www.eecs.harvard.edu/~michaelm/postscripts/esa2009.pd...

-----


Every time I've tried comparing cuckoo hashing vs traditional hash algorithms in practice, the time taken to compute the additional hash functions outweighs any gains in performance.

Counter-intuitively, I've also noticed in many cases that using binary search over sorted elements in contiguous memory is actually faster than using a hash table at all.

Has anyone else found this?

-----


> the time taken to compute the additional hash functions outweighs any gains in performance.

Sounds like you're using the wrong hash functions.

Are you using secure cryptographic hash functions perchance? (such as MD5, SHA, etc) Because they're not intended for use in data structures.

Most data structure algorithms just require a hash function with good avalanche behaviour and a statistically even bit dispersion. The FNV hash will do this for you with just a MUL and a XOR per byte, which is (rough guess) at least 100 times faster than SHA. FNV hash (http://www.isthe.com/chongo/tech/comp/fnv/) it's super-effective!

-----


It very much depends on your hash table implementation, but I'm betting that your hash table is storing pointers to objects stored elsewhere. In this case, an array of elements in contiguous memory will be much faster because of locality. It's sometimes surprising how badly most of the data structures we think about treat caches.

-----


I almost always prefer some kind of balanced tree to a hash table, because no matter the original specifications, sometime during the development of a program I usually need the elements in order.

-----


Read full article from Hash tables with O(1) worst-case lookup and space efficiency [pdf] | Hacker News


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts