Jump to content

X-fast trie: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Lucaske (talk | contribs)
m Fix typo
 
(24 intermediate revisions by 12 users not shown)
Line 1: Line 1:
{{Short description|Data structure for storing integers from a bounded domain}}
{| class="infobox" style="width: 20em"
{{Infobox data structure
! colspan="2" style="font-size: 125%; text-align: center" | X-fast trie
| name = X-fast trie
|-
| type = [[Trie]]
! [[List of data structures|Type]]
| invented_by = [[Dan Willard]]
| [[Trie]]
| invented_year = 1982
|-
| space_avg = ''O''(''n'' log ''M'')
! Invented
| search_avg = ''O''(log log ''M'')
| 1982
| insert_avg = ''O''(log ''M'') [[Amortized analysis|amortized]]
|-
| delete_avg = ''O''(log ''M'') amortized
! Invented by
| space_worst = ''O''(''n'' log ''M'')
| [[Dan Willard]]
| search_worst = ''O''(log log ''M'')
|-
}}
! colspan="2" style="text-align: center; background-color: #CCCCFF; color: #000000;" | Asymptotic complexity<br />in [[big O notation]]
|-
! Space
| ''O''(''n'' log ''M'')
|-
! Search
| ''O''(log log ''M'')
|-
! Insert
| ''O''(log ''M'') [[Amortized analysis|amortized]] [[Best, worst and average case|average]]
|-
! Delete
| ''O''(log ''M'') amortized average
|}


In [[computer science]], an '''x-fast trie''' is a [[data structure]] for storing [[integer]]s from a bounded domain. It supports exact and predecessor or successor queries in time [[Big-O notation|''O'']](log&nbsp;log&nbsp;''M''), using ''O''(''n''&nbsp;log&nbsp;''M'') space, where ''n'' is the number of stored values and ''M'' is the maximum value in the domain. The structure was proposed by [[Dan Willard]] in 1982,<ref name="PaperWillard" /> along with the more complicated [[y-fast trie]], as a way to improve the space usage of [[van Emde Boas tree]]s, while retaining the ''O''(log&nbsp;log&nbsp;''M'') query time.
In [[computer science]], an '''x-fast trie''' is a [[data structure]] for storing [[integer]]s from a bounded domain. It supports exact and predecessor or successor queries in time [[Big-O notation|''O'']](log&nbsp;log&nbsp;''M''), using ''O''(''n''&nbsp;log&nbsp;''M'') space, where ''n'' is the number of stored values and ''M'' is the maximum value in the domain. The structure was proposed by [[Dan Willard]] in 1982,<ref name="PaperWillard" /> along with the more complicated [[y-fast trie]], as a way to improve the space usage of [[van Emde Boas tree]]s, while retaining the ''O''(log&nbsp;log&nbsp;''M'') query time.
Line 30: Line 17:
==Structure==
==Structure==


[[File:Xfast trie example.svg|thumb|upright=1.7|alt=A binary tree with 4 levels. The nodes on each level are: 3: (), 2: (0) and (1), 1: (00) and (10), 0: (001), (100) and (101). The unlabeled node is the root. There are directed edges between the folllowing nodes: ()->(0), ()->(1), (0)->(00), (0)->(001) in blue, (1)->(10), (1)->(101) in blue, (00)->(001) twice, once in blue, (10)->(100), (10)->(101), (001)<->(100), (100)<->(101). The nodes on each level are contained in a box, labeled with LSS(<level>).|An x-fast trie containing the integers 1 (001<sub>2</sub>), 4 (100<sub>2</sub>) and 5 (101<sub>2</sub>). Blue edges indicate descendant pointers.]]
[[File:Xfast trie example.svg|thumb|upright=1.7|alt=A binary tree with 4 levels. The nodes on each level are: 3: (), 2: (0) and (1), 1: (00) and (10), 0: (001), (100) and (101). The unlabeled node is the root. There are directed edges between the following nodes: ()->(0), ()->(1), (0)->(00), (0)->(100) in blue, (1)->(10), (1)->(101) in blue, (00)->(001) twice, once in blue, (10)->(100), (10)->(101), (001)<->(100), (100)<->(101). The nodes on each level are contained in a box, labeled with LSS(<level>).|An x-fast trie containing the integers 1 (001<sub>2</sub>), 4 (100<sub>2</sub>) and 5 (101<sub>2</sub>). Blue edges indicate descendant pointers.]]


An x-fast trie is a [[Trie#Bitwise tries|bitwise trie]]: a [[binary tree]] where each subtree stores values whose [[Binary numeral system|binary representations]] start with a common prefix. Each internal node is labeled with the common prefix of the values in its subtree and typically, the left child adds a 0 to the end of the prefix, while the right child adds a 1. The binary representation of an integer between 0 and ''M''&nbsp;&minus;&nbsp;1 uses ⌈log<sub>2</sub>&nbsp;''M''⌉ bits, so the height of the trie is ''O''(log&nbsp;''M'').
An x-fast trie is a [[Trie#Bitwise tries|bitwise trie]]: a [[binary tree]] where each subtree stores values whose [[Binary numeral system|binary representations]] start with a common prefix. Each internal node is labeled with the common prefix of the values in its subtree and typically, the left child adds a 0 to the end of the prefix, while the right child adds a 1. The binary representation of an integer between 0 and ''M''&nbsp;&minus;&nbsp;1 uses ⌈log<sub>2</sub>&nbsp;''M''⌉ bits, so the height of the trie is ''O''(log&nbsp;''M'').
Line 51: Line 38:
Finding the value associated with a key ''k'' that is in the data structure can be done in constant time by looking up ''k'' in ''LSS''[0], which is a hash table on all the leaves.<ref name="PaperBose" />
Finding the value associated with a key ''k'' that is in the data structure can be done in constant time by looking up ''k'' in ''LSS''[0], which is a hash table on all the leaves.<ref name="PaperBose" />


For example, if we are looking for 4 in the above graph, we will implement the following steps:
===Successor and Predecessor===

* Step 1: Convert the decimal 4 to binary, which is 100.
* Step 2: Start from the root, and try to follow the path to each level. The first digit of 100 is 1, so follow the right path (1) of the root to the node "1".
* Step 3: Repeat the Step 2, the 2nd digit of 100 is 0, so follow the left path (0) of the node "1" to the node "10".
* Step 4: Repeat the Step 3, the 3rd digits of 100 is 0, so follow the left path (0) of the node "10" to the node "100".

===Successor and predecessor===


To find the successor or predecessor of a key ''k'', we first find ''A''<sub>''k''</sub>, the lowest ancestor of ''k''. This is the node in the trie that has the longest common prefix with ''k''. To find ''A''<sub>''k''</sub>, we perform a [[binary search]] on the levels. We start at level ''h''/2, where ''h'' is the height of the trie. On each level, we query the corresponding hash table in the level-search structure with the prefix of ''k'' of the right length. If a node with that prefix does not exist, we know that ''A''<sub>''k''</sub> must be at a higher level and we restrict our search to those. If a node with that prefix does exist, ''A''<sub>''k''</sub> can not be at a higher level, so we restrict our search to the current and lower levels.
To find the successor or predecessor of a key ''k'', we first find ''A''<sub>''k''</sub>, the lowest ancestor of ''k''. This is the node in the trie that has the longest common prefix with ''k''. To find ''A''<sub>''k''</sub>, we perform a [[binary search]] on the levels. We start at level ''h''/2, where ''h'' is the height of the trie. On each level, we query the corresponding hash table in the level-search structure with the prefix of ''k'' of the right length. If a node with that prefix does not exist, we know that ''A''<sub>''k''</sub> must be at a higher level and we restrict our search to those. If a node with that prefix does exist, ''A''<sub>''k''</sub> can not be at a higher level, so we restrict our search to the current and lower levels.


Once we find the lowest ancestor of ''k'', we know that it has leaves in one of its subtrees (otherwise it wouldn't be in the trie) and ''k'' should be in the other subtree. Therefore the descendant pointer points to the successor or the predecessor of ''k''. Depending on which one we are looking for, we might have to take one step in the linked list to the next or previous leaf.
Once we find the lowest ancestor of ''k'', we know that it has leaves in one of its subtrees (otherwise it wouldn't be in the trie) and ''k'' should be in the other subtree. Therefore, the descendant pointer points to the successor or the predecessor of ''k''. Depending on which one we are looking for, we might have to take one step in the linked list to the next or previous leaf.


Since the trie has height ''O''(log&nbsp;''M''), the binary search for the lowest ancestor takes ''O''(log&nbsp;log&nbsp;''M'') time. After that, the successor or predecessor can be found in constant time, so the total query time is ''O''(log&nbsp;log&nbsp;''M'').<ref name="PaperWillard" />
Since the trie has height ''O''(log&nbsp;''M''), the binary search for the lowest ancestor takes ''O''(log&nbsp;log&nbsp;''M'') time. After that, the successor or predecessor can be found in constant time, so the total query time is ''O''(log&nbsp;log&nbsp;''M'').<ref name="PaperWillard" />

For example, if we are looking for the predecessor of 3 in the above graph, we will implement the following steps:

* Step 1: Convert the decimal 4 to binary, which is 011.
* Step 2: Start from the root, and try to follow the path to each level. The first digit of 011 is 0, so follow the left path (0) of the root to the node "0".
* Step 3: Repeat the Step 2, the 2nd digit of 011 is 1, so try to follow the right path (1) . However, the node "0" has no right path, so follow the pointer to the node "001".
* Step 4: 001 is smaller than 011, so it represents the predecessor of 011. Therefore, the predecessor of 3 is 1 (001).


===Insert===
===Insert===
Line 77: Line 78:
A compression technique similar to [[patricia trie]]s can be used to significantly reduce the space usage of x-fast tries in practice.<ref name="PaperKementsietsidis" />
A compression technique similar to [[patricia trie]]s can be used to significantly reduce the space usage of x-fast tries in practice.<ref name="PaperKementsietsidis" />


By using an [[Binary search#one-sided search|exponential search]] before the binary search over the levels and by querying not only the current prefix ''x'', but also its successor ''x''&nbsp;+&nbsp;1, x-fast tries can answer predecessor and successor queries in time ''O''(log&nbsp;log&nbsp;''&Delta;''), where ''&Delta;'' is the difference between the query value and its predecessor or successor.<ref name="PaperBose" />
By using an [[Binary search#one-sided search|exponential search]] before the binary search over the levels and by querying not only the current prefix ''x'', but also its successor ''x''&nbsp;+&nbsp;1, x-fast tries can answer predecessor and successor queries in time ''O''(log&nbsp;log&nbsp;''Δ''), where ''Δ'' is the difference between the query value and its predecessor or successor.<ref name="PaperBose" />


== References ==
== References ==
Line 86: Line 87:
| last = Willard
| last = Willard
| first = Dan E.
| first = Dan E.
| title = Log-logarithmic worst-case range queries are possible in space &Theta;(''N'')
| title = Log-logarithmic worst-case range queries are possible in space Θ(''N'')
| year = 1983
| year = 1983
| journal = Information Processing Letters
| journal = Information Processing Letters
Line 103: Line 104:
| last2 = Wang
| last2 = Wang
| first2 = Min
| first2 = Min
| title = Provenance Query Evaluation: What’s so special about it?
| title = Provenance Query Evaluation: What's so special about it?
| series = Proceedings of the 18th ACM conference on Information and knowledge management
| series = Proceedings of the 18th ACM conference on Information and knowledge management
| year = 2009
| year = 2009
Line 116: Line 117:
| first2 = Karim
| first2 = Karim
| last3 = Dujmović
| last3 = Dujmović
| first3 = Vida
| first3 = Vida | author3-link = Vida Dujmović
| last4 = Howat
| last4 = Howat
| first4 = John
| first4 = John
| last5 = Morin
| last5 = Morin
| first5 = Pat
| first5 = Pat | author5-link = Pat Morin
| title = Fast Local Searches and Updates in Bounded Universes
| title = Fast Local Searches and Updates in Bounded Universes
| series = Proceedings of the 22nd Canadian Conference on Computational Geometry (CCCG2010)
| series = Proceedings of the 22nd Canadian Conference on Computational Geometry (CCCG2010)
Line 131: Line 132:
{{cite web
{{cite web
| url = http://courses.csail.mit.edu/6.851/spring10/scribe/lec09.pdf
| url = http://courses.csail.mit.edu/6.851/spring10/scribe/lec09.pdf
| accessdate = 2011-04-13
| access-date = 2011-04-13
| title = Lecture Notes from Lecture 9 of Advanced Data Structures (Spring '10, 6.851)
| title = Lecture Notes from Lecture 9 of Advanced Data Structures (Spring '10, 6.851)
| first1 = André
| first1 = André
Line 145: Line 146:
*[http://opendatastructures.org/versions/edition-0.1c/ods-java/node64.html Open Data Structure - Chapter 13 - Data Structures for Integers]
*[http://opendatastructures.org/versions/edition-0.1c/ods-java/node64.html Open Data Structure - Chapter 13 - Data Structures for Integers]


<!--- Categories --->
{{CS-Trees}}
{{CS-Trees}}


<!--- Categories --->
[[Category:Trees (data structures)]]
[[Category:Trees (data structures)]]
[[Category:Associative arrays]]

Latest revision as of 20:41, 9 December 2024

X-fast trie
TypeTrie
Invented1982
Invented byDan Willard
Time complexity in big O notation
Operation Average Worst case
Search O(log log M) O(log log M)
Insert O(log M) amortized
Delete O(log M) amortized
Space complexity
Space O(n log M) O(n log M)

In computer science, an x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982,[1] along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.

Structure

[edit]
A binary tree with 4 levels. The nodes on each level are: 3: (), 2: (0) and (1), 1: (00) and (10), 0: (001), (100) and (101). The unlabeled node is the root. There are directed edges between the following nodes: ()->(0), ()->(1), (0)->(00), (0)->(100) in blue, (1)->(10), (1)->(101) in blue, (00)->(001) twice, once in blue, (10)->(100), (10)->(101), (001)<->(100), (100)<->(101). The nodes on each level are contained in a box, labeled with LSS(<level>).
An x-fast trie containing the integers 1 (0012), 4 (1002) and 5 (1012). Blue edges indicate descendant pointers.

An x-fast trie is a bitwise trie: a binary tree where each subtree stores values whose binary representations start with a common prefix. Each internal node is labeled with the common prefix of the values in its subtree and typically, the left child adds a 0 to the end of the prefix, while the right child adds a 1. The binary representation of an integer between 0 and M − 1 uses ⌈log2 M⌉ bits, so the height of the trie is O(log M).

All values in the x-fast trie are stored at the leaves. Internal nodes are stored only if they have leaves in their subtree. If an internal node would have no left child, it stores a pointer to the smallest leaf in its right subtree instead, called a descendant pointer. Likewise, if it would have no right child, it stores a pointer to the largest leaf in its left subtree. Each leaf stores a pointer to its predecessor and successor, thereby forming a doubly linked list. Finally, there is a hash table for each level that contains all the nodes on that level. Together, these hash tables form the level-search structure (LSS). To guarantee the worst-case query times, these hash tables should use dynamic perfect hashing or cuckoo hashing.

The total space usage is O(n log M), since each element has a root-to-leaf path of length O(log M).

Operations

[edit]

Like van Emde Boas trees, x-fast tries support the operations of an ordered associative array. This includes the usual associative array operations, along with two more order operations, Successor and Predecessor:

  • Find(k): find the value associated with the given key
  • Successor(k): find the key/value pair with the smallest key larger than or equal to the given key
  • Predecessor(k): find the key/value pair with the largest key less than or equal to the given key
  • Insert(k, v): insert the given key/value pair
  • Delete(k): remove the key/value pair with the given key

Find

[edit]

Finding the value associated with a key k that is in the data structure can be done in constant time by looking up k in LSS[0], which is a hash table on all the leaves.[2]

For example, if we are looking for 4 in the above graph, we will implement the following steps:

  • Step 1: Convert the decimal 4 to binary, which is 100.
  • Step 2: Start from the root, and try to follow the path to each level. The first digit of 100 is 1, so follow the right path (1) of the root to the node "1".
  • Step 3: Repeat the Step 2, the 2nd digit of 100 is 0, so follow the left path (0) of the node "1" to the node "10".
  • Step 4: Repeat the Step 3, the 3rd digits of 100 is 0, so follow the left path (0) of the node "10" to the node "100".

Successor and predecessor

[edit]

To find the successor or predecessor of a key k, we first find Ak, the lowest ancestor of k. This is the node in the trie that has the longest common prefix with k. To find Ak, we perform a binary search on the levels. We start at level h/2, where h is the height of the trie. On each level, we query the corresponding hash table in the level-search structure with the prefix of k of the right length. If a node with that prefix does not exist, we know that Ak must be at a higher level and we restrict our search to those. If a node with that prefix does exist, Ak can not be at a higher level, so we restrict our search to the current and lower levels.

Once we find the lowest ancestor of k, we know that it has leaves in one of its subtrees (otherwise it wouldn't be in the trie) and k should be in the other subtree. Therefore, the descendant pointer points to the successor or the predecessor of k. Depending on which one we are looking for, we might have to take one step in the linked list to the next or previous leaf.

Since the trie has height O(log M), the binary search for the lowest ancestor takes O(log log M) time. After that, the successor or predecessor can be found in constant time, so the total query time is O(log log M).[1]

For example, if we are looking for the predecessor of 3 in the above graph, we will implement the following steps:

  • Step 1: Convert the decimal 4 to binary, which is 011.
  • Step 2: Start from the root, and try to follow the path to each level. The first digit of 011 is 0, so follow the left path (0) of the root to the node "0".
  • Step 3: Repeat the Step 2, the 2nd digit of 011 is 1, so try to follow the right path (1) . However, the node "0" has no right path, so follow the pointer to the node "001".
  • Step 4: 001 is smaller than 011, so it represents the predecessor of 011. Therefore, the predecessor of 3 is 1 (001).

Insert

[edit]

To insert a key-value pair (k, v), we first find the predecessor and successor of k. Then we create a new leaf for k, insert it in the linked list of leaves between the successor and predecessor, and give it a pointer to v. Next, we walk from the root to the new leaf, creating the necessary nodes on the way down, inserting them into the respective hash tables and updating descendant pointers where necessary.

Since we have to walk down the entire height of the trie, this process takes O(log M) time.[3]

Delete

[edit]

To delete a key k, we find its leaf using the hash table on the leaves. We remove it from the linked list, but remember which were the successor and predecessor. Then we walk from the leaf to the root of the trie, removing all nodes whose subtree only contained k and updating the descendant pointers where necessary. Descendant pointers that used to point to k will now point to either the successor or predecessor of k, depending on which subtree is missing.

Like insertion, this takes O(log M) time, as we have to walk through every level of the trie.[3]

Discussion

[edit]

Willard introduced x-fast tries largely as an introduction to y-fast tries, which provide the same query time, while using only O(n) space and allowing insertions and deletions in O(log log M) time.[1]

A compression technique similar to patricia tries can be used to significantly reduce the space usage of x-fast tries in practice.[4]

By using an exponential search before the binary search over the levels and by querying not only the current prefix x, but also its successor x + 1, x-fast tries can answer predecessor and successor queries in time O(log log Δ), where Δ is the difference between the query value and its predecessor or successor.[2]

References

[edit]
  1. ^ a b c Willard, Dan E. (1983). "Log-logarithmic worst-case range queries are possible in space Θ(N)". Information Processing Letters. 17 (2). Elsevier: 81–84. doi:10.1016/0020-0190(83)90075-3. ISSN 0020-0190.
  2. ^ a b Bose, Prosenjit; Douïeb, Karim; Dujmović, Vida; Howat, John; Morin, Pat (2010), Fast Local Searches and Updates in Bounded Universes (PDF), Proceedings of the 22nd Canadian Conference on Computational Geometry (CCCG2010), pp. 261–264
  3. ^ a b Schulz, André; Christiano, Paul (2010-03-04). "Lecture Notes from Lecture 9 of Advanced Data Structures (Spring '10, 6.851)" (PDF). Retrieved 2011-04-13.
  4. ^ Kementsietsidis, Anastasios; Wang, Min (2009), Provenance Query Evaluation: What's so special about it?, Proceedings of the 18th ACM conference on Information and knowledge management, pp. 681–690
[edit]