All About Programming: Hash Tables with Worst-Case O(1) Access -- Hopscotch Hashing , Universal Hashing and Extendible Hashing

Hash Tables with Worst-Case O(1) Access — Hopscotch Hashing , Universal Hashing and Extendible Hashing | Lei Jiang Coding

Hopscotch hashing is a new algorithm to improve the linear probing algorithm. Because of primary and secondary clustering, this sequence can be long on average as the table gets loaded, thus many improvements such as quadratic probing, double hashing, and so forth, have been proposed to reduce the number of collisions.

However, on some modern architectures, the locality produced by probing adjacent cells is a more signiﬁcant factor than the extra probes, and linear probing can still be practical or even a best choice.

Hopscotch hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table using open addressing.

Hopscotch hashing. Here, H is 4. Gray entries are occupied. In part (a), the item x is added with a hash value of 6. A linear probe finds that entry 13 is empty. Because 13 is more than 4 entries away from 6, the algorithm looks for an earlier entry to swap with 13. The first place to look in is H-1 = 3 entries before, at entry 10. That entry's hop information bit-map indicates that d, the item at entry 11, can be displaced to 13. After displacing d, Entry 11 is still too far from entry 6, so the algorithm examines entry 8. The hop information bit-map indicates that item c at entry 9 can be moved to entry 11. Finally, a is moved to entry 9. Part (b) shows the table state just after adding x.

The idea of hopscotch hashing is to bound the maximal length of the probe sequence by a predetermined constant that is optimized to the underlying computer's architecture. Doing so would give constant-time lookups in the worst case,and like cuckoo hashing,the lookup could be parallelized to simultaneously check the bounded set of possible locations. If an insertion would place a new item too far from its hash location, then efﬁciently go backward toward the hash location, evicting potential items. the evictions can be done quickly and guarantee that those evicted are not placed too far from their hash locations.

Universal Hashing Using universal hashing (in a randomized algorithm or data structure) refers to selecting a hash function at random from a family of hash functions with a certain mathematical property . This guarantees a low number of collisions in expectation, even if the data is chosen by an adversary.

Extendible hashing is a type of hash system which treats a hash as a bit string, and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation (done one bucket at a time, as needed). This means that time-sensitive applications are less affected by table growth than by standard full-table rehashes.

If either probing hashing or separate chaining hashing is used, the major problem is that collisions could cause several blocks to be examined during a search, even for a well-distributed hash table. Furthermore, when the table gets too full, an extremely expensive rehashing step must be performed, which requires O(N) disk accesses. A clever alternative, known as extendible hashing, allows a search to be performed in two disk accesses. Insertions also require few disk accesses.

Read full article from Hash Tables with Worst-Case O(1) Access — Hopscotch Hashing , Universal Hashing and Extendible Hashing | Lei Jiang Coding

Hash Tables with Worst-Case O(1) Access -- Hopscotch Hashing , Universal Hashing and Extendible Hashing | Lei Jiang Coding

No comments:

Post a Comment

Labels

Popular Posts