这是用户在 2024-8-7 22:44 为 https://csacademy.com/lesson/fenwick_trees 保存的双语快照页面，由 沉浸式翻译 提供双语支持。了解如何保存？

- Connected

Chat

.. Loading ..

Activity

Sign in

Suppose we have an array $A$ and we want to support the following two operations:

*Update*: change the value of an element $A_i$*Query*: find the value of a certain partial sum $A_1 + A_2 + ... + A_i$

A **Fenwick tree** or a **binary indexed tree** is a data structure that handles both of these efficiently. It should be noted that if we have such a data structure, we can also find the sum over an interval $[i, j]$ by just calculating $sum(j) - sum(i-1)$.

There are two naive solutions for this problem.

The first one is to simply maintain the array $A$. For an update we need to change an element. For a query we go through all the elements of the prefix and compute their sum.

1

2

3

4

5

6

7

8

9

10

11

12

void update(int pos, int val) {

A[pos] = val;

}

int query(int pos) {

int sum = 0;

for (int i = 1; i <= pos; ++i) {

sum += A[i];

}

return sum;

}

הההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההה

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This solution handles the update in $O(1)$, but the query is quite slow as it works in $O(N)$.

The second naive solution is to maintain all the partial sums. Let's say we have an array $S$, where $S_i = A_1 + A_2 + ... + A_i$. If we update element $A_i$ we need to update all the partial sums $S_j$, where $j \geq i$. In order to answer a query we only need to check a single element of $S$.

1

2

3

4

5

6

7

8

9

10

11

void update(int pos, int val) {

int currentVal = S[pos] - S[pos - 1];

for (int i = pos; i <= N; ++i) {

S[i] += val - currentVal;

}

}

int query(int pos) {

return S[pos];

}

הההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההה

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This second naive solution handles the update in $O(N)$, but the query is fast and works in $O(1)$.

In a way, these two solutions complement each other. One is performing the update fast and the query slow, while the other one does the opposite. The goal of a faster data structure is to find some sort of middle ground.

It's helpful to consider that in each of these naive solutions we maintain for each position $i$ an interval that ends in that position. In the first solution we maintain the interval $[i, i]$, of length $1$, while in the seconds solution we maintain the interval $[1, i]$, of length $i$.

When we want to calculate the sum of all values up to a certain index, in both solutions we keep adding the sum for the interval ending at our current index, and then skip to the first value that's not in our interval.

When we update the value at a certain index, we need to update all intervals that contain that index.

To improve on the previous solutions, we need to have a balance between long and short intervals, so we can skip ever increasing intervals and we don't need to update too many intervals when a value changes. If all intervals have the same length for instance, the best balance is when we use $\left \lfloor{\sqrt N}\right \rfloor$ as the length of our intervals. This is a common technique when you know one of your operations takes $B$ steps, and the other one $N/B$ steps, we would pick $B=\left \lfloor{\sqrt N}\right \rfloor$, to minimize the worst case.

But we can get even better results if we consider the length of the interval ending at $i$ to be the largest power of $2$ that's a divisor of $i$.

Below you can see a graphical representation of the intervals, for $N=16$:

These intervals may seem a bit chaotic at a first glance.

Let's say $F_i$ is the length of the interval ending at index $i$. Basically are dealing with intervals of the form $[i-F_i+1, i]$. For the example above $F=[1, 2, 1, 4, 1, 2, 1, 8, 1, 2, 1, 4, 1, 2, 1, 16]$.

We can show how to build $F$ incrementally over lengths that are powers of $2$.

- If $N = 1$, it only makes sense to maintain an interval of length $1$, so $F=[1]$:

- The next step is for $N=2$. We will keep track of the first element and the sum of the first two, $F=[1, 2]$:

- To find $F$ for the next power of $2$, we take the previous $F$, append a copy of it, and change the last element to be equal to $N$. So in the case of $N=4$, appending a copy to the previous $F$ means we get $[1, 2, 1, 2]$, but then we change the last element, so the new $F=[1, 2, 1, 4]$

- For $N=8$, we have $F=[1, 2, 1, 4, 1, 2, 1, 8]$

- And for $N=16$, we get our result $F=[1, 2, 1, 4, 1, 2, 1, 8, 1, 2, 1, 4, 1, 2, 1, 16]$.

So far, this data structures seems to be just about maintaining some interval sums, there's no explicit tree involved. But at a closer look, we can see that any two intervals are either disjoint, or one of them is completely contained by the other. This means we can interpret the set of intervals as a rooted tree (actually a forest), where the father of an interval is the shortest interval that contains it.

For example, the father of $2$ (the interval ending at index $2$) is $4$, the father of $7$ is $8$, and the father of $12$ is $16$. Below, you can hover over an element to highlight its interval and its father interval, in case it exists.

Now that we identified the hidden tree structure, there are two important details we need to figure out.

First, given an index $i$, how do you calculate the length of the interval ending at $i$? We defined it as the largest power of $2$ that's a divisor of $i$. It would be nice to have a formula though. This value is know as the *least significant bit*, abbreviated *lsb*, of $i$'s representation. We can write the following function:

1

2

3

4

5

6

7

8

int length(int pos) {

int bit = 0;

while ((pos >> bit) & 1 == 0) {

++bit;

}

return (1 << bit);

}

הההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההה

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This can take $O(log N)$ steps, but fortunately though, there is a faster way of finding this value, and it's all got to do with binary representation of negative numbers. To represent a negative value $-x$, we take the binary representation of $x$, complement it, and then add $1$ (ignoring the final carry). Let's take an example where the numbers are represented on $8$ bits for simplicity:

1

2

3

4

x = 01101100

~x = 10010011

-x = 10010100

Notice the bits more significant than the *lsb* are different for $x$ and $-x$, while the *lsb* and the following $0$'s are the same. Therefore, $x \& {-x}$ gives the wanted answer, in our case:

1

2

x & -x = 00000100

Now we have a method of finding the length of any interval in $O(1)$:

1

2

3

4

int lsb(int pos) {

return pos & -pos;

}

Since we've seen that intervals are either disjoint or fully contained one inside another, this part is equivalent with repeatedly just finding the next smallest interval that contains our own. Notice the father of an interval is always identified by a higher index, so we can say we need to find the *difference* between $i$ and its father.

If $i$ is a power of $2$, the difference is equal to $i$, because the father is the next power of $2$. Otherwise, we can again make use of the inductive way the Fenwick tree is build, and conclude that the difference for $i$ is equal to the difference for $i-p$ ($p$ is the largest power of $2$, smaller than $i$).

You can already guess where this is going: the difference between an interval $i$ and its father is equal to the interval's length, which we already know how to compute really fast.

When we change to updating the interval ending at $pos$, its father, its father's father, and so on.

In the function below we consider the first parameter to be the updated position, and the second one the difference between the new value and the old value being updated:

1

2

3

4

5

6

7

void update(int pos, int val) {

while (pos <= N) {

fenwick[pos] += val;

pos += lsb(pos);

}

}

As for the query, we start at the index being queried, add the sum of its interval, and move to the left. This is where knowing the length of the intervals comes in handy:

1

2

3

4

5

6

7

8

9

int query(int pos) {

int sum = 0;

while (pos > 0) {

sum += fenwick[pos];

pos -= lsb(pos);

}

return sum;

}

There are two important properties that help us:

- The father of an interval is at least double in size
- The interval to the left of any interval is at least double in size

This means we will perform at most $\log N$ steps in each of the two operations, so the complexity is $O(\log N)$. It seems we found the middle ground we were looking for.

leloyHi! I think you forgot to return the sum in the query function

JamVery helpful post.

alex.velea@

Fury73, you can go to the archive and select the Fenwick Tree tag from the left, if you want to see all problems related to a specific data structureFury73can u put some problems need this data structure and not just sum problem

ekzhangIn fact, if you index the nodes of a binomial heap in postorder traversal, and store the array value of each index at the node, then the fenwick tree is really just storing the subtree sums in an array.

ekzhangOne interesting sidenote: The tree structure generated by the nested set of intervals in a fenwick tree is actually the same structure as a binomial heap!

eliyanovThese lessons are fantastic. I've been looking for something like this for 2 years.