Jump to content

Exponential search: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
 
(42 intermediate revisions by 27 users not shown)
Line 1: Line 1:
{{Short description|Algorithm for searching sorted, infinite lists}}
{{multiple issues|
{{Orphan|date=May 2014}}
{{format footnotes|date=May 2014}}
}}

{{Infobox algorithm
{{Infobox algorithm
|class=[[Search algorithm]]
|class=[[Search algorithm]]
Line 15: Line 11:
}}
}}


In [[computer science]], an '''exponential search''' (also called '''doubling search''' or '''galloping search''')<ref>{{citation|contribution=Fast intersection algorithms for sorted sequences|first1=Ricardo|last1=Baeza-Yates|first2=Alejandro|last2=Salinger|title=Algorithms and Applications: Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday|volume=6060|series=Lecture Notes in Computer Science|editor1-first=Tapio|editor1-last=Elomaa|editor2-first=Heikki|editor2-last=Mannila|editor3-first=Pekka|editor3-last=Orponen|publisher=Springer|year=2010|isbn=9783642124754|pages=45–61|doi=10.1007/978-3-642-12476-1_3}}.</ref> is an [[algorithm]], created by [[Jon Bentley]] and [[Andrew Chi-Chih Yao]] in 1976, for searching sorted, unbounded/infinite lists.<ref name=PaperBentley/> There are numerous ways to implement this with the most common being to determine a range that the search key resides in and performing a [[binary search]] within that range. This takes [[Big-O notation|''O'']](log&nbsp;''i'') where ''i'' is the position of the search key in the list, if the search key is in the list, or the position where the search key should be, if the search key is not in the list.
In [[computer science]], an '''exponential search''' (also called '''doubling search''' or '''galloping search''' or '''Struzik search''')<ref name=Baeza-Yates/> is an [[algorithm]], created by [[Jon Bentley (computer scientist)|Jon Bentley]] and [[Andrew Chi-Chih Yao]] in 1976, for searching sorted, unbounded/infinite lists.<ref name=PaperBentley/> There are numerous ways to implement this, with the most common being to determine a range that the search key resides in and performing a [[binary search]] within that range. This takes [[Big-O notation|''O'']](log&nbsp;''i'') where ''i'' is the position of the search key in the list, if the search key is in the list, or the position where the search key should be, if the search key is not in the list.


Exponential search can also be used to search in bounded lists. Exponential search can even out-perform more traditional searches for bounded lists, such as binary search, when the element being searched for is near the beginning of the array. This is because exponential search will run in ''O''(log&nbsp;''i'') time, where ''i'' is the index of the element being searched for in the list, whereas binary search would run in ''O''(log&nbsp;''n'') time, where ''n'' is the number of elements in the list.
Exponential search can also be used to search in bounded lists. Exponential search can even out-perform more traditional searches for bounded lists, such as binary search, when the element being searched for is near the beginning of the array. This is because exponential search will run in ''O''(log&nbsp;''i'') time, where ''i'' is the index of the element being searched for in the list, whereas binary search would run in ''O''(log&nbsp;''n'') time, where ''n'' is the number of elements in the list.
Line 23: Line 19:
Exponential search allows for searching through a sorted, unbounded list for a specified input value (the search "key"). The algorithm consists of two stages. The first stage determines a range in which the search key would reside if it were in the list. In the second stage, a binary search is performed on this range. In the first stage, assuming that the list is sorted in ascending order, the algorithm looks for the first [[exponentiation|exponent]], ''j'', where the value 2{{sup|j}} is greater than the search key. This value, 2{{sup|j}} becomes the upper bound for the binary search with the previous power of 2, 2{{sup|j - 1}}, being the lower bound for the binary search.<ref name="NotesJonsson"/>
Exponential search allows for searching through a sorted, unbounded list for a specified input value (the search "key"). The algorithm consists of two stages. The first stage determines a range in which the search key would reside if it were in the list. In the second stage, a binary search is performed on this range. In the first stage, assuming that the list is sorted in ascending order, the algorithm looks for the first [[exponentiation|exponent]], ''j'', where the value 2{{sup|j}} is greater than the search key. This value, 2{{sup|j}} becomes the upper bound for the binary search with the previous power of 2, 2{{sup|j - 1}}, being the lower bound for the binary search.<ref name="NotesJonsson"/>


<source lang="cpp">
<syntaxhighlight lang="cpp">
// Returns the position of key in the array arr of length size.
template <typename T>
template <typename T>
int exponential_search (T * arr, int size, T key)
int exponential_search(T arr[], int size, T key)
{
{
if ( size < 0 || ! arr )
if (size == 0) {
return -1 ;
return NOT_FOUND;
}


if ( arr [0] == key )
int bound = 1;
while (bound < size && arr[bound] < key) {
return 0 ;
bound *= 2;
}


if ( size && arr [1] == key ) // if (size) <=> if (size > 0) since for size < 0
return binary_search(arr, key, bound/2, min(bound, size));
return 1 ; // we would not have gotten this far.

// for less than 2 elements it makes little sense
// to even search, thus we simply check upfront.

int upper_bound = 4 ;
while ( upper_bound * 2 < size && arr [upper_bound] < key )
end *= 2 ;

return binary_search (arr, key, upper_bound/2, upper_bound);
}
}
</syntaxhighlight>
</source>


In each step, the algorithm compares the search key value with the key value at the current search index. If the element at the current index is smaller than the search key, the algorithm repeats, skipping to the next search index by doubling it, calculating the next power of 2.<ref name="NotesJonsson"/> If the element at the current index is larger than the search key, the algorithm now knows that the search key, if it is contained in the list at all, is located in the interval formed by the previous search index, 2{{sup|j - 1}}, and the current search index, 2{{sup|j}}. The binary search is then performed with the result of either a failure, if the search key is not in the list, or the position of the search key in the list.
In each step, the algorithm compares the search key value with the key value at the current search index. If the element at the current index is smaller than the search key, the algorithm repeats, skipping to the next search index by doubling it, calculating the next power of 2.<ref name="NotesJonsson"/> If the element at the current index is larger than the search key, the algorithm now knows that the search key, if it is contained in the list at all, is located in the interval formed by the previous search index, 2{{sup|j - 1}}, and the current search index, 2{{sup|j}}. The binary search is then performed with the result of either a failure, if the search key is not in the list, or the position of the search key in the list.
Line 55: Line 43:
The first stage of the algorithm takes ''O''(log&nbsp;''i'') time, where ''i'' is the index where the search key would be in the list. This is because, in determining the upper bound for the binary search, the while loop is executed exactly <math>\lceil \log(i) \rceil</math> times. Since the list is sorted, after doubling the search index <math>\lceil \log(i) \rceil</math> times, the algorithm will be at a search index that is greater than or equal to ''i'' as <math>2^{\lceil \log(i) \rceil} \ge i</math>. As such, the first stage of the algorithm takes ''O''(log&nbsp;''i'') time.
The first stage of the algorithm takes ''O''(log&nbsp;''i'') time, where ''i'' is the index where the search key would be in the list. This is because, in determining the upper bound for the binary search, the while loop is executed exactly <math>\lceil \log(i) \rceil</math> times. Since the list is sorted, after doubling the search index <math>\lceil \log(i) \rceil</math> times, the algorithm will be at a search index that is greater than or equal to ''i'' as <math>2^{\lceil \log(i) \rceil} \ge i</math>. As such, the first stage of the algorithm takes ''O''(log&nbsp;''i'') time.


The second part of the algorithm also takes ''O''(log&nbsp;''i'') time. As the second stage is simply a binary search, it takes ''O''(log&nbsp;''n'') where ''n'' is the size of the interval being searched. The size of this interval would be 2{{sup|''j''}} - 2{{sup|''j'' - 1}} where, as seen above, ''j'' = log&nbsp;''i''. This means that the size of the interval being searched is 2{{sup|log ''i''}} - 2{{sup|log ''i'' - 1}} = 2{{sup|log ''i'' - 1}}. This gives us a run time of log&nbsp;(2{{sup|log ''i'' - 1}}) = log&nbsp;(''i'' - 1) = ''O''(log&nbsp;''i'').
The second part of the algorithm also takes ''O''(log&nbsp;''i'') time. As the second stage is simply a binary search, it takes ''O''(log&nbsp;''n'') where ''n'' is the size of the interval being searched. The size of this interval would be 2{{sup|''j''}} - 2{{sup|''j'' - 1}} where, as seen above, ''j'' = log&nbsp;''i''. This means that the size of the interval being searched is 2{{sup|log ''i''}} - 2{{sup|log ''i'' - 1}} = 2{{sup|log ''i'' - 1}}. This gives us a run time of log&nbsp;(2{{sup|log ''i'' - 1}}) = log&nbsp;(''i'') - 1 = ''O''(log&nbsp;''i'').


This gives the algorithm a total runtime, calculated by summing the runtimes of the two stages, of ''O''(log&nbsp;''i'') + ''O''(log&nbsp;''i'') = 2 ''O''(log&nbsp;''i'') = ''O''(log&nbsp;''i'').
This gives the algorithm a total runtime, calculated by summing the runtimes of the two stages, of ''O''(log&nbsp;''i'') + ''O''(log&nbsp;''i'') = 2 ''O''(log&nbsp;''i'') = ''O''(log&nbsp;''i'').


== Alternatives ==
== Alternatives ==

Bentley and Yao suggested several variations for exponential search.<ref name="PaperBentley"/> These variations consist of performing a binary search, as opposed to a unary search, when determining the upper bound for the binary search in the second stage of the algorithm. This splits the first stage of the algorithm into two parts, making the algorithm a three-stage algorithm overall. The new first stage determines a value ''<math>j'</math>'', much like before, such that <math>2^{j'}</math> is larger than the search key and <math>2^{j'/2}</math> is lower than the search key. Previously, ''<math>j'</math>'' was determined in a unary fashion by calculating the next power of 2 (i.e., adding 1 to ''j''). In the variation, it is proposed that <math>j'</math> is doubled instead (e.g., jumping from 2{{sup|2}} to 2{{sup|4}} as opposed to 2{{sup|3}}). The first ''<math>j'</math>'' such that <math>2^{j'}</math> is greater than the search key forms a much rougher upper bound than before. Once this ''<math>j'</math>'' is found, the algorithm moves to its second stage and a binary search is performed on the interval formed by <math>j'/2</math> and <math>j'</math>, giving the more accurate upper bound exponent ''j''. From here, the third stage of the algorithm performs the binary search on the interval 2{{sup|''j'' - 1}} and 2{{sup|''j''}}, as before. The performance of this variation is <math>\lfloor \log i \rfloor + 2 \lfloor \log(\lfloor \log i \rfloor + 1)\rfloor + 1</math> = ''O''(log&nbsp;''i'').
Bentley and Yao suggested several variations for exponential search.<ref name="PaperBentley"/> These variations consist of performing a binary search, as opposed to a unary search, when determining the upper bound for the binary search in the second stage of the algorithm. This splits the first stage of the algorithm into two parts, making the algorithm a three-stage algorithm overall. The new first stage determines a value ''<math>j'</math>'', much like before, such that <math>2^{j'}</math> is larger than the search key and <math>2^{j'/2}</math> is lower than the search key. Previously, ''<math>j'</math>'' was determined in a unary fashion by calculating the next power of 2 (i.e., adding 1 to ''j''). In the variation, it is proposed that <math>j'</math> is doubled instead (e.g., jumping from 2{{sup|2}} to 2{{sup|4}} as opposed to 2{{sup|3}}). The first ''<math>j'</math>'' such that <math>2^{j'}</math> is greater than the search key forms a much rougher upper bound than before. Once this ''<math>j'</math>'' is found, the algorithm moves to its second stage and a binary search is performed on the interval formed by <math>j'/2</math> and <math>j'</math>, giving the more accurate upper bound exponent ''j''. From here, the third stage of the algorithm performs the binary search on the interval 2{{sup|''j'' - 1}} and 2{{sup|''j''}}, as before. The performance of this variation is <math>\lfloor \log i \rfloor + 2 \lfloor \log(\lfloor \log i \rfloor + 1)\rfloor + 1</math> = ''O''(log&nbsp;''i'').


Bentley and Yao generalize this variation into one where any number, ''k'', of binary searches is performed during the first stage of the algorithm, giving the ''k''-nested binary search variation. The asymptotic runtime does not change for the variations, running in ''O''(log&nbsp;''i'') time, as with the original exponential search algorithm.
Bentley and Yao generalize this variation into one where any number, ''k'', of binary searches are performed during the first stage of the algorithm, giving the ''k''-nested binary search variation. The asymptotic runtime does not change for the variations, running in ''O''(log&nbsp;''i'') time, as with the original exponential search algorithm.


Also, a data structure with a tight version of the [[splay tree|dynamic finger property]] can be given when the above result of the ''k''-nested binary search is used on a sorted array.<ref name="PaperAndersson"/> Using this, the number of comparisons done during a search is log&nbsp;(''d'') + log&nbsp;log&nbsp;(''d'') + ... + ''O''(log&nbsp;{{sup|*}}''d''), where ''d'' is the difference in rank between the last element that was accessed and the current element being accessed.
Also, a data structure with a tight version of the [[splay tree|dynamic finger property]] can be given when the above result of the ''k''-nested binary search is used on a sorted array.<ref name="PaperAndersson"/> Using this, the number of comparisons done during a search is log&nbsp;(''d'') + log&nbsp;log&nbsp;(''d'') + ... + ''O''(log&nbsp;{{sup|*}}''d''), where ''d'' is the difference in rank between the last element that was accessed and the current element being accessed.

== Applications ==
An algorithm based on exponentially increasing the search band solves [[Sequence alignment|global pairwise alignment]] for ''O(ns)'', where ''n'' is the length of the sequences and ''s'' is the [[edit distance]] between them.<ref>{{Cite journal |last=Ukkonen |first=Esko |date=March 1985 |title=Finding approximate patterns in strings |url=http://dx.doi.org/10.1016/0196-6774(85)90023-9 |journal=Journal of Algorithms |volume=6 |issue=1 |pages=132–137 |doi=10.1016/0196-6774(85)90023-9 |issn=0196-6774}}</ref><ref>{{Cite web |last1=Šošić |first1=Martin |last2=Šikić |first2=Mile |date=2016-08-23 |title=Edlib: a C/C++ library for fast, exact sequence alignment using edit distance |doi=10.1101/070649 |s2cid=3818517 |url=http://dx.doi.org/10.1101/070649 }}</ref>

==See also==
*[[Linear search]]
*[[Binary search]]
*[[Interpolation search]]
*[[Ternary search]]
*[[Hash table]]


== References ==
== References ==

<!--- See [[Wikipedia:Footnotes]] on how to create references using<ref></ref> tags which will then appear here automatically -->
<!--- See [[Wikipedia:Footnotes]] on how to create references using<ref></ref> tags which will then appear here automatically -->
{{Reflist|refs=
{{Reflist|refs=
<ref name=Baeza-Yates>{{citation|contribution=Fast intersection algorithms for sorted sequences|first1=Ricardo|last1=Baeza-Yates|author1-link= Ricardo Baeza-Yates |first2=Alejandro|last2=Salinger|title=Algorithms and Applications: Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday|volume=6060|series=Lecture Notes in Computer Science|editor1-first=Tapio|editor1-last=Elomaa|editor2-first=Heikki|editor2-last=Mannila|editor2-link=Heikki Mannila|editor3-first=Pekka|editor3-last=Orponen|publisher=Springer|year=2010|isbn=9783642124754|pages=45–61|doi=10.1007/978-3-642-12476-1_3|bibcode=2010LNCS.6060...45B}}.</ref>
<ref name="PaperBentley">
<ref name="PaperBentley">
{{cite journal
{{cite journal
| last1 = Bentley
| last1 = Bentley
| first1 = Jon L.
| first1 = Jon L. | author1-link = Jon Bentley (computer scientist)
| last2 = Yao
| last2 = Yao
| first2 = Andrew C.
| first2 = Andrew C. | author2-link = Andrew Yao
| title = An Almost Optimal Algorithm For Unbounded Searching
| title = An almost optimal algorithm for unbounded searching
| year = 1976
| year = 1976
| journal = Information Processing Letters
| journal = [[Information Processing Letters]]
| volume = 5
| volume = 5
| issue = 3
| issue = 3
Line 85: Line 82:
| issn = 0020-0190
| issn = 0020-0190
| doi = 10.1016/0020-0190(76)90071-5
| doi = 10.1016/0020-0190(76)90071-5
}}</ref>
| publisher = Elsevier
}}</ref><ref name="NotesJonsson">
<ref name="NotesJonsson">{{cite web
| url = https://sites.google.com/site/d7013e11/announcements/exponentialbinarysearch
{{cite web
| accessdate = 2014-03-24
| url = https://sites.google.com/site/d7013e11/announcements/exponentialbinarysearch
| accessdate = 2014-03-24
| title = Exponential Binary Search
| title = Exponential Binary Search
| first = Håkan
| first = Håkan
| last = Jonsson
| last = Jonsson
| date = 2011-04-19
| date = 2011-04-19
| archive-date = 2020-06-01
}}</ref><ref name="PaperAndersson">
| archive-url = https://web.archive.org/web/20200601234125/https://sites.google.com/site/d7013e11/announcements/exponentialbinarysearch
| url-status = dead
}}</ref>
<ref name="PaperAndersson">
{{cite journal
{{cite journal
| last1 = Andersson
| last1 = Andersson
| first1 = Arne
| first1 = Arne
| last2 = Thorup
| last2 = Thorup
| first2 = Mikkel
| first2 = Mikkel | author2-link = Mikkel Thorup
| title = Dynamic ordered sets with exponential search trees
| title = Dynamic ordered sets with exponential search trees
| year = 2007
| year = 2007
| journal = Journal of the ACM (JACM)
| journal = [[Journal of the ACM]]
| volume = 54
| volume = 54
| issue = 3
| issue = 3
Line 108: Line 108:
| issn = 0004-5411
| issn = 0004-5411
| doi = 10.1145/1236457.1236460
| doi = 10.1145/1236457.1236460
| arxiv= cs/0210006| s2cid = 8175703
| publisher = ACM
}}
}}
</ref>
</ref>
}}
}}

{{improve categories|date=May 2014}}


[[Category:Search algorithms]]
[[Category:Search algorithms]]

Latest revision as of 07:13, 5 July 2024

Exponential search
ClassSearch algorithm
Data structureArray
Worst-case performanceO(log i)
Best-case performanceO(1)
Average performanceO(log i)
Worst-case space complexityO(1)
OptimalYes

In computer science, an exponential search (also called doubling search or galloping search or Struzik search)[1] is an algorithm, created by Jon Bentley and Andrew Chi-Chih Yao in 1976, for searching sorted, unbounded/infinite lists.[2] There are numerous ways to implement this, with the most common being to determine a range that the search key resides in and performing a binary search within that range. This takes O(log i) where i is the position of the search key in the list, if the search key is in the list, or the position where the search key should be, if the search key is not in the list.

Exponential search can also be used to search in bounded lists. Exponential search can even out-perform more traditional searches for bounded lists, such as binary search, when the element being searched for is near the beginning of the array. This is because exponential search will run in O(log i) time, where i is the index of the element being searched for in the list, whereas binary search would run in O(log n) time, where n is the number of elements in the list.

Algorithm

[edit]

Exponential search allows for searching through a sorted, unbounded list for a specified input value (the search "key"). The algorithm consists of two stages. The first stage determines a range in which the search key would reside if it were in the list. In the second stage, a binary search is performed on this range. In the first stage, assuming that the list is sorted in ascending order, the algorithm looks for the first exponent, j, where the value 2j is greater than the search key. This value, 2j becomes the upper bound for the binary search with the previous power of 2, 2j - 1, being the lower bound for the binary search.[3]

// Returns the position of key in the array arr of length size.
template <typename T>
int exponential_search(T arr[], int size, T key)
{
    if (size == 0) {
        return NOT_FOUND;
    }

    int bound = 1;
    while (bound < size && arr[bound] < key) {
        bound *= 2;
    }

    return binary_search(arr, key, bound/2, min(bound, size));
}

In each step, the algorithm compares the search key value with the key value at the current search index. If the element at the current index is smaller than the search key, the algorithm repeats, skipping to the next search index by doubling it, calculating the next power of 2.[3] If the element at the current index is larger than the search key, the algorithm now knows that the search key, if it is contained in the list at all, is located in the interval formed by the previous search index, 2j - 1, and the current search index, 2j. The binary search is then performed with the result of either a failure, if the search key is not in the list, or the position of the search key in the list.

Performance

[edit]

The first stage of the algorithm takes O(log i) time, where i is the index where the search key would be in the list. This is because, in determining the upper bound for the binary search, the while loop is executed exactly times. Since the list is sorted, after doubling the search index times, the algorithm will be at a search index that is greater than or equal to i as . As such, the first stage of the algorithm takes O(log i) time.

The second part of the algorithm also takes O(log i) time. As the second stage is simply a binary search, it takes O(log n) where n is the size of the interval being searched. The size of this interval would be 2j - 2j - 1 where, as seen above, j = log i. This means that the size of the interval being searched is 2log i - 2log i - 1 = 2log i - 1. This gives us a run time of log (2log i - 1) = log (i) - 1 = O(log i).

This gives the algorithm a total runtime, calculated by summing the runtimes of the two stages, of O(log i) + O(log i) = 2 O(log i) = O(log i).

Alternatives

[edit]

Bentley and Yao suggested several variations for exponential search.[2] These variations consist of performing a binary search, as opposed to a unary search, when determining the upper bound for the binary search in the second stage of the algorithm. This splits the first stage of the algorithm into two parts, making the algorithm a three-stage algorithm overall. The new first stage determines a value , much like before, such that is larger than the search key and is lower than the search key. Previously, was determined in a unary fashion by calculating the next power of 2 (i.e., adding 1 to j). In the variation, it is proposed that is doubled instead (e.g., jumping from 22 to 24 as opposed to 23). The first such that is greater than the search key forms a much rougher upper bound than before. Once this is found, the algorithm moves to its second stage and a binary search is performed on the interval formed by and , giving the more accurate upper bound exponent j. From here, the third stage of the algorithm performs the binary search on the interval 2j - 1 and 2j, as before. The performance of this variation is = O(log i).

Bentley and Yao generalize this variation into one where any number, k, of binary searches are performed during the first stage of the algorithm, giving the k-nested binary search variation. The asymptotic runtime does not change for the variations, running in O(log i) time, as with the original exponential search algorithm.

Also, a data structure with a tight version of the dynamic finger property can be given when the above result of the k-nested binary search is used on a sorted array.[4] Using this, the number of comparisons done during a search is log (d) + log log (d) + ... + O(log *d), where d is the difference in rank between the last element that was accessed and the current element being accessed.

Applications

[edit]

An algorithm based on exponentially increasing the search band solves global pairwise alignment for O(ns), where n is the length of the sequences and s is the edit distance between them.[5][6]

See also

[edit]

References

[edit]
  1. ^ Baeza-Yates, Ricardo; Salinger, Alejandro (2010), "Fast intersection algorithms for sorted sequences", in Elomaa, Tapio; Mannila, Heikki; Orponen, Pekka (eds.), Algorithms and Applications: Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday, Lecture Notes in Computer Science, vol. 6060, Springer, pp. 45–61, Bibcode:2010LNCS.6060...45B, doi:10.1007/978-3-642-12476-1_3, ISBN 9783642124754.
  2. ^ a b Bentley, Jon L.; Yao, Andrew C. (1976). "An almost optimal algorithm for unbounded searching". Information Processing Letters. 5 (3): 82–87. doi:10.1016/0020-0190(76)90071-5. ISSN 0020-0190.
  3. ^ a b Jonsson, Håkan (2011-04-19). "Exponential Binary Search". Archived from the original on 2020-06-01. Retrieved 2014-03-24.
  4. ^ Andersson, Arne; Thorup, Mikkel (2007). "Dynamic ordered sets with exponential search trees". Journal of the ACM. 54 (3): 13. arXiv:cs/0210006. doi:10.1145/1236457.1236460. ISSN 0004-5411. S2CID 8175703.
  5. ^ Ukkonen, Esko (March 1985). "Finding approximate patterns in strings". Journal of Algorithms. 6 (1): 132–137. doi:10.1016/0196-6774(85)90023-9. ISSN 0196-6774.
  6. ^ Šošić, Martin; Šikić, Mile (2016-08-23). "Edlib: a C/C++ library for fast, exact sequence alignment using edit distance". doi:10.1101/070649. S2CID 3818517.