To and Froh: Immutable Hash Trie Maps in Java
Immutable Hash Trie Maps in Java
In this post, I aim to introduce another immutable data structure, following up on my posts on immutable binary trees ([1] and [2]) and immutable lists. We'll be looking at hash tries, which provide an efficient way of implementing an immutable HashMap
-like structure. Like a HashMap
, we can provide more or less constant time operations in the absence of hash collisions, and linear time when collisions occur. The catch is that, in practice, our constants will be considerably larger. That said, we'll have a data structure that requires no synchronization when accessed from multiple threads, since it cannot be modified in-place.
Our hash trie map will be constructed as a tree with a high branching factor. If you haven't already read the posts on immutable binary trees, I suggest you do so, especially the second one. In particular, we're going to be relying heavily on specialized node types to save memory and split our logic into different cases based on polymorphic behaviour.
What is a trie?
Tries are a kind of tree where the path from the root to a given node encodes some earlier information. A nice use for tries is to store dictionaries in a relatively efficient way. Your root node represents the empty string, and it has 26 children, for the letters of the alphabet. Each node could be implemented as an array of 26 (possibly null) elements, plus a boolean to indicate whether that node represents a valid word. The word 'cat' would be encoded by the third-level node reached by going to 'c', then 'a', then 't', from the root. Since 'cat' is a word, the boolean value at that node would be true. By comparison, the node for 'coug' would be false, but would need to exist as a parent for 'cough'.
Our hash trie map will work on a somewhat similar principle, except that our valid "words" will be hash codes for the keys contained in the map.
An overview of what our tree will look like
To implement our trie structure, we'll have nodes that branch over part of the hash code of the elements (BranchedArrayHashNode
), a singleton EmptyHashNode
that represents a lack of children for a given partial hash, an EntryHashNode
that represents a single key/value pair, and a ListHashNode
that contains a linked list of key/value pairs whose keys' hashes collide. As a useful optimization, we'll implement a specialized SingleArrayHashNode
for internal nodes who only have one child, so we don't unnecessarily allocate an entire array.
Like with the immutable binary trees, an add or remove operation will descend the tree until it reaches a (possibly empty) leaf node, which will provide its own replacement. Then, as the recursive calls return, the parents will provide their replacements (which will reuse the other children), until we get a new root node. In this way, we can guarantee (in the absence of a collision) running time that is logarithmic over the possible hash codes. Since the space of possible hash codes is fixed, the maximum depth of the tree is log(2^32) (which would be 32 with a branching factor of 2, but we'll use a branching factor of 2^5 = 32, guaranteeing a maximum depth of 7). This is where we get the claim of constant running time in the absence of collisions. (In the event of a collision, we hit a ListHashNode
and we add a linear traversal of the list.)
Read full article from To and Froh: Immutable Hash Trie Maps in Java
No comments:
Post a Comment