Jump to content

Van Emde Boas tree: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Dcoetzee (talk | contribs)
References: Linking second part of MIT course notes
Line 14: Line 14:
As one might expect from the constrained key set, we will rely fundamentally on the representation of integers to achieve our time bounds. The idea is to take the ''m''-bit key and divide it into ''a'', its ''m''/2 most significant bits, and ''b'', its ''m''/2 least significant bits. Use ''a'' to index into an [[array]] of 2<sup>''m''/2</sup> vEB trees, each capable of holding ''m''/2-bit numbers, and recursively search for ''b'' in the ''a''th one. The effect is to reduce the number of bits in the key by half for each recursive call.
As one might expect from the constrained key set, we will rely fundamentally on the representation of integers to achieve our time bounds. The idea is to take the ''m''-bit key and divide it into ''a'', its ''m''/2 most significant bits, and ''b'', its ''m''/2 least significant bits. Use ''a'' to index into an [[array]] of 2<sup>''m''/2</sup> vEB trees, each capable of holding ''m''/2-bit numbers, and recursively search for ''b'' in the ''a''th one. The effect is to reduce the number of bits in the key by half for each recursive call.


In addition to its speed, the trees are can be quite compact when they contain many elements. Because we don't create any subtrees until we need to add something to them. Initially, each element we add creates about log(''m'') new trees containing about ''m/2'' pointers all together. As the tree grows, more and more subtrees are reused, especially the larger ones. In a full tree of 2<sup>''m''</sup> elements, only O(2<sup>''m''</sup>) space is used. Moreover, unlike a binary search tree, most of this space is being used to store data: even for billions or quintillions of elements, the pointers in a full vEB tree number in the thousands.
In addition to its speed, the trees are can be quite compact when they contain many elements, because we don't create any subtrees until we need to add something to them. Initially, each element we add creates about log(''m'') new trees containing about ''m/2'' pointers all together. As the tree grows, more and more subtrees are reused, especially the larger ones. In a full tree of 2<sup>''m''</sup> elements, only O(2<sup>''m''</sup>) space is used. Moreover, unlike a binary search tree, most of this space is being used to store data: even for billions or quintillions of elements, the pointers in a full vEB tree number in the thousands.


However, for small trees the overhead associated with vEB trees is enormous: on the order of 2<sup>''m''/2</sup>. This is one reason why they are not popular in practice. One way of addressing this limitation is to use only a fixed number of bits per level, which results in a [[trie]].
However, for small trees the overhead associated with vEB trees is enormous: on the order of 2<sup>''m''/2</sup>. This is one reason why they are not popular in practice. One way of addressing this limitation is to use only a fixed number of bits per level, which results in a [[trie]].

Revision as of 05:28, 7 June 2006


A van Emde Boas tree (or van Emde Boas priority queue), also known as a vEB tree, is a tree data structure which implements an associative array with m-bit integer keys. It performs all operations in O(log m) time. Notice that m is the size of the keys — therefore O(log m) is O(log log n) in a full tree, exponentially better than a self-balancing binary search tree. They also have good space efficiency when they contain a large number of elements, as discussed below. They were invented by a team led by Peter van Emde Boas in 1977[1].

Supported operations

The operations supported by a vEB tree are those of an ordered associative array, which includes the usual associative array operations along with two more order operations, FindNext and FindPrevious[2]:

  • Insert: insert a key/value pair with an m-bit key
  • Delete: remove the key/value pair with a given key
  • Lookup: find the value associated with a given key
  • FindNext: find the key/value pair with the smallest key at least a given k
  • FindPrev: find the key/value pair with the largest key at most a given k

How it works

As one might expect from the constrained key set, we will rely fundamentally on the representation of integers to achieve our time bounds. The idea is to take the m-bit key and divide it into a, its m/2 most significant bits, and b, its m/2 least significant bits. Use a to index into an array of 2m/2 vEB trees, each capable of holding m/2-bit numbers, and recursively search for b in the ath one. The effect is to reduce the number of bits in the key by half for each recursive call.

In addition to its speed, the trees are can be quite compact when they contain many elements, because we don't create any subtrees until we need to add something to them. Initially, each element we add creates about log(m) new trees containing about m/2 pointers all together. As the tree grows, more and more subtrees are reused, especially the larger ones. In a full tree of 2m elements, only O(2m) space is used. Moreover, unlike a binary search tree, most of this space is being used to store data: even for billions or quintillions of elements, the pointers in a full vEB tree number in the thousands.

However, for small trees the overhead associated with vEB trees is enormous: on the order of 2m/2. This is one reason why they are not popular in practice. One way of addressing this limitation is to use only a fixed number of bits per level, which results in a trie.

The order operations are slightly more complicated. Suppose we add the following to each tree, including all subtrees:

  • a flag to tell us whether it is empty,
  • a field giving the maximum value in the tree,
  • a field giving the minimum value in the tree.

Then we can perform FindNext as follows: let a be the top half and b the bottom half of the bits of k, the argument to FindNext. If b lies below the maximum value of subtree a, then our result is in that subtree, so we invoke FindNext on it recursively with b. Otherwise, we find the first nonempty subtree with index > a, and return its minimum value.

This usually works, except for one small problem: the search could require as long as m/2 time. To speed it up, instead of storing flags, we add one more vEB tree able to hold numbers up to 2m/2 called top which contains the indexes of all nonempty trees in the array. We can then invoke FindNext recursively on top to identify the first index > a with a nonempty tree, and get its minimum element. FindPrev is similar.

Unfortunately, this makes things difficult, because now we have to maintain the top tree properly. Doing this the naive way, by adding and removing when trees become empty and nonempty, results in a double recursion that could take O(m) time. To fix this, we first add a size field. Next, we switch from storing the minimum element in the tree itself to storing it in the minimum field. Now, adding an element to an empty tree is constant time, so we have time left to make a recursion on top to add the index. Likewise, removing the last element from a tree is constant time, leaving time to remove the tree's index from top. All operations are, finally, O(log m).

In practical implementations, especially on machines with shift-by-k and find first zero instructions, we can further improve performance by switching to a bit array once we reach m equal to the word size or a small multiple thereof. Since all operations on a single word are constant time, this does not affect the asymptotic performance, but it does avoid the majority of the pointer storage and several pointer dereferences, allowing us to achieve a significant practical savings in time and space with this trick.

References