Bit Indexed Tree (Fenwick Tree) | Gautam's personal blog



Bit Indexed Tree or Fenwick Tree is a complex data structure that is used to calculate the cumulative frequency of a data set.  It was invented by Peter Fenwick, hence the name Fenwick Tree.

To understand what is the purpose for this type of a data structure, first we have to first understand what is the problem it is trying to solve. Let us take up the example of web statistics of a blog or web site. Let us say that there are 8 pages and and in a array we update number of visits for each page. We make two queries on this data,

1. Update or Add new frequency to a page.

add(int index, int freq)

2. Get the the cumulative sum of frequencies from 1 to a given index,

sum(int index)

First naive approach could be to just store the frequencies as shown below,

INDEX 1 2 3 4 5 6 7 8
FREQUENCY 4 2 0 5 1 3 8 1

In this approach the add query would be O(1) and sum query would be O(n).

Another naive approach could be to keep track of the cumulative sum for each add or update,

INDEX 1 2 3 4 5 6 7 8
FREQUENCY 4 2 0 5 1 3 8 1
CUM SUM 4 6 6 11 12 15 23 24

In the table above the first row shows the page indexes (total 8 pages). The 2nd row shows the number of hits on each page (frequency) at a particular time and the 3rd row shows the cumulative hit (sum2=feq1+feq2, sum3=sum2+feq3 etc).

Now if we need to update the frequency of the 6th page, index 6 from 3 to 9, we just have to update the 2nd row 6th index to 9. But updating the 3rd row is not easy, we have to update all the cells in the 3rd row after 6th.

In this case add would be O(n) and sum would be O(1). Not good enough.

Basic Concept

Each integer can be represented as sum of powers of two.

5 = 2^2+2^1

In the same way, cumulative frequency can be represented as sum of sets of subfrequencies. In our case, each set contains some successive number of non-overlapping frequencies.

We can create a bit indexed tree in a one dimensional array from the original array which contains the frequency of each index.

Original ARRAY

 index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
frequency 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0 2

TRANSFORM TO FENWICK TREE

 index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Fenwick tree 1 1 2 4 1 4 0 12 2 7 2 11 3 4 0 29

The table below shows the concept in detail,

First row is the index, 2nd row shows the frequency and 3rd row shows the cumulative sum.  The fourth row shows the values for each of the index of the Fenwick Tree. The fifth row explains the concept of the tree, that the element in the first index is responsible for 1 position. Element in 2nd index is responsible for two positions (1 and 2), Element in the 3rd position is responsible for 1 position (3) only. Like that the element in 4th position is responsible for 4 positions (1 to 4).

 index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
frequency 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0 2
cumulative sum 1 1 3 4 5 8 8 12 14 19 21 23 26 27 27 29
Fenwick tree 1 1 2 4 1 4 0 12 2 7 2 11 3 4 0 29
responsible for 1 1..2 3 1..4 5 5..6 7 1..8 9 9..10 11 9..12 13 13..14 15 1..16

The diagram below show the responsibility of each index in the Fenwick Tree,

Suppose we are looking for cumulative frequency of index 13 (for the first 13 elements). In binary notation, 13 is equal to 1101. Accordingly, we will calculate c[1101] = tree[1101] + tree[1100] + tree[1000].

So basically we need to  find out the last non zero digit from the binary number of the index. A very detailed explanation is provided in the original article in top coder here, this is a must read.

Here is java code to build the Fenwick Tree from a frequency set,

    public static int[] buildFenwickTree(int[] A){          int[] fenwickTree = new int[A.length];            for (int i = 1; i < = A.length; i++) {              int idx=i;              do{                  fenwickTree[idx-1] += A[i-1];                  idx += (idx & -idx);              }while (idx < = A.length && idx>0);          }          return fenwickTree;      }

Read cumulative frequency
If we want to read cumulative frequency for some integer idx, we add to sum tree[idx], substract last bit of idx from itself (also we can write – remove the last digit; change the last digit to zero) and repeat this while idx is greater than zero.

    public static int read(int idx){          int sum = 0;          while (idx > 0){              sum = sum + tree[idx];              int x = idx & -idx;              idx = idx - x;          }          return sum;      }

Similarly to update a frequency following method could be used,

    public static void update(int idx ,int val){          while (idx < tree.length){              tree[idx] += val;              idx += (idx & -idx);          }      }

If we want to find the actual frequency in a index, we could preserve the original array and return the value in the given index. But in this case we have compromise on space. We could calculate the actual frequency efficiently from the Fenwick Tree itself. The detailed explanation is provided in the original tutorial in the top coder site could be found here.


Read full article from Bit Indexed Tree (Fenwick Tree) | Gautam's personal blog


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts