Jump to content

Approximate counting algorithm: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Citation bot (talk | contribs)
m Add: isbn. | You can use this bot yourself. Report bugs here.| Activated by User:Nemo bis | via #UCB_webform
 
(13 intermediate revisions by 7 users not shown)
Line 1: Line 1:
The '''approximate counting algorithm''' allows the counting of a large number of events using a small amount of memory. Invented in 1977 by [[Robert Morris (cryptographer)]] of [[Bell Labs]], it uses [[randomized algorithm|probabilistic techniques]] to increment the [[Counter (digital)|counter]]. It was fully analyzed in the early 1980s by [[Philippe Flajolet]] of [[INRIA]] Rocquencourt, who coined the name '''approximate counting''', and strongly contributed to its recognition among the research community. The algorithm is considered one of the precursors of streaming algorithms, and the more general problem of determining the frequency moments of a data stream has been central to the field.
The '''approximate counting algorithm''' allows the counting of a large number of events using a small amount of memory. Invented in 1977 by [[Robert Morris (cryptographer)|Robert Morris]] of [[Bell Labs]], it uses [[randomized algorithm|probabilistic techniques]] to increment the [[Counter (digital)|counter]]. It was fully analyzed in the early 1980s by [[Philippe Flajolet]] of [[INRIA]] Rocquencourt, who coined the name '''approximate counting''', and strongly contributed to its recognition among the research community. When focused on high quality of approximation and low probability of failure, [[Jelani Nelson|Nelson]] and Yu showed that a very slight modification to the Morris Counter is asymptotically optimal amongst all algorithms for the problem.<ref name=ny2020>{{cite journal | last1=Nelson | first1=Jelani | last2 = Yu | first2 = Huacheng | year=2020 | title=Optimal bounds for approximate counting | arxiv=2010.02116 }} </ref> The algorithm is considered one of the precursors of [[Streaming algorithm|streaming algorithms]], and the more general problem of determining the frequency moments of a data stream has been central to the field.


==Theory of operation==
==Theory of operation==
Line 6: Line 6:
To increment the counter, a [[pseudo-random]] event is used, such that the incrementing is a probabilistic event. To save space, only the exponent is kept. For example, in base 2, the counter can estimate the count to be 1, 2, 4, 8, 16, 32, and all of the [[powers of two]]. The memory requirement is simply to hold the [[exponent]].
To increment the counter, a [[pseudo-random]] event is used, such that the incrementing is a probabilistic event. To save space, only the exponent is kept. For example, in base 2, the counter can estimate the count to be 1, 2, 4, 8, 16, 32, and all of the [[powers of two]]. The memory requirement is simply to hold the [[exponent]].


As an example, to increment from 4 to 8, a pseudo-random number would be generated such that a probability of .25 generates a positive change in the counter. Otherwise, the counter remains at 4.
As an example, to increment from 4 to 8, a pseudo-random number would be generated such that the probability the counter is increased is 0.25. Otherwise, the counter remains at 4.


The table below illustrates some of the potential values of the counter:
The table below illustrates some of the potential values of the counter:
Line 48: Line 48:
|}
|}


If the counter holds the value of 101, which equates to an exponent of 5 (the decimal equivalent of 101), then the estimated count is 2^5, or 32. There is a fairly low probability that the actual count of increment events was 5 (<math>\frac{1}{1024} = 1 \times \frac{1}{2} \times \frac{1}{4} \times \frac{1}{8} \times \frac{1}{16}</math>). The actual count of increment events is likely to be "around 32", but it could be arbitrarily high (with decreasing probabilities for actual counts above 39).
If the counter holds the value of 101, which equates to an exponent of 5 (the decimal equivalent of 101), then the estimated count is <math>2^5</math>, or 32. There is a fairly low probability that the actual count of increment events was 5 (<math>\frac{1}{1024} = 1 \times \frac{1}{2} \times \frac{1}{4} \times \frac{1}{8} \times \frac{1}{16}</math>). The actual count of increment events is likely to be "around 32", but it could be arbitrarily high (with decreasing probabilities for actual counts above 39).


===Selecting counter values===
===Selecting counter values===
While using powers of 2 as counter values is memory efficient, arbitrary values tend to create a dynamic error range, and the smaller values will have a greater error ratio than bigger values. Other methods of selecting counter values consider parameters such as memory availability, desired error ratio, or counting range to provide an optimal set of values.<ref>Tsidon, Erez, Iddo Hanniel, and Isaac Keslassy. "Estimators also need shared values to grow together." INFOCOM, 2012 Proceedings IEEE. IEEE, 2012.</ref>
While using powers of 2 as counter values is memory efficient, arbitrary values tend to create a dynamic error range, and the smaller values will have a greater error ratio than bigger values. Other methods of selecting counter values consider parameters such as memory availability, desired error ratio, or counting range to provide an optimal set of values.<ref>Tsidon, Erez, Iddo Hanniel, and Isaac Keslassy. "Estimators also need shared values to grow together." INFOCOM, 2012 Proceedings IEEE. IEEE, 2012.</ref>


However, when several counters share the same values, values are optimized according to the counter with the largest counting range, and produce sub-optimal accuracy for smaller counters. Mitigation is achieved by maintaining Independent Counter Estimation buckets,<ref>{{Cite journal|last=Einziger|first=G.|last2=Fellman|first2=B.|last3=Kassner|first3=Y.|date=April 2015|title=Independent counter estimation buckets|url=https://ieeexplore.ieee.org/abstract/document/7218646/|journal=2015 IEEE Conference on Computer Communications (INFOCOM)|pages=2560–2568|doi=10.1109/INFOCOM.2015.7218646|isbn=978-1-4799-8381-0}}</ref> which restrict the effect of a larger counter to the other counters in the bucket.
However, when several counters share the same values, values are optimized according to the counter with the largest counting range, and produce sub-optimal accuracy for smaller counters. Mitigation is achieved by maintaining Independent Counter Estimation buckets,<ref>{{Cite book|last1=Einziger|first1=G.|last2=Fellman|first2=B.|last3=Kassner|first3=Y.|title=2015 IEEE Conference on Computer Communications (INFOCOM) |chapter=Independent counter estimation buckets |date=April 2015|pages=2560–2568|doi=10.1109/INFOCOM.2015.7218646|isbn=978-1-4799-8381-0|s2cid=15673730 }}</ref> which restrict the effect of a larger counter to the other counters in the bucket.


==Algorithm==
==Algorithm==
When incrementing the counter, "flip a coin" the number of times of the counter's current value. If it comes up "Heads" each time, then increment the counter. Otherwise, do not increment it.
The algorithm can be implemented by hand. When incrementing the counter, flip a coin a number of times corresponding to the counter's current value. If it comes up heads each time, then increment the counter. Otherwise, do not increment it.


This can be done programmatically by generating "c" pseudo-random bits (where "c" is the current value of the counter), and using the logical AND function on all of those bits. The result is a zero if any of those pseudo-random bits are zero, and a one if they are all ones. Simply add the result to the counter. This procedure is executed each time the request is made to increment the counter.
This can be easily achieved on a computer. Let <math>c</math> be the current value of the counter. Generating <math>c</math> pseudo-random bits and using the [[Logical conjunction|logical AND]] of all those bits and add the result to the counter. As the result was zero if any of those pseudo-random bits are zero, achieving an increment probability of <math>2^{-c}</math>. This procedure is executed each time the request is made to increment the counter.


==Applications==
==Applications==
Line 70: Line 70:


==Sources==
==Sources==
* Morris, R. Counting large numbers of events in small registers. Communications of the ACM 21, 10 (1977), 840–842
* Morris, R. ''Counting large numbers of events in small registers''. Communications of the ACM 21, 10 (1978), 840–842
* Flajolet, P. Approximate Counting: A Detailed Analysis. BIT 25, (1985), 113–134 [http://algo.inria.fr/flajolet/Publications/Flajolet85c.pdf]
* Flajolet, P. ''Approximate Counting: A Detailed Analysis''. BIT 25, (1985), 113–134 [http://algo.inria.fr/flajolet/Publications/Flajolet85c.pdf]
* Michael F., Chung-Kuei L., Prodinger, Approximate Counting via the Poisson-Laplace-Mellin Method [https://web.archive.org/web/20160304042036/http://jupiter.math.nctu.edu.tw/~mfuchs/approx_count_3.pdf]
* Fouchs, M., Lee, C-K., Prodinger, H., ''Approximate Counting via the Poisson-Laplace-Mellin Method'' [https://web.archive.org/web/20160304042036/http://jupiter.math.nctu.edu.tw/~mfuchs/approx_count_3.pdf]


[[Category:Randomized algorithms]]
[[Category:Randomized algorithms]]

Latest revision as of 10:05, 26 November 2024

The approximate counting algorithm allows the counting of a large number of events using a small amount of memory. Invented in 1977 by Robert Morris of Bell Labs, it uses probabilistic techniques to increment the counter. It was fully analyzed in the early 1980s by Philippe Flajolet of INRIA Rocquencourt, who coined the name approximate counting, and strongly contributed to its recognition among the research community. When focused on high quality of approximation and low probability of failure, Nelson and Yu showed that a very slight modification to the Morris Counter is asymptotically optimal amongst all algorithms for the problem.[1] The algorithm is considered one of the precursors of streaming algorithms, and the more general problem of determining the frequency moments of a data stream has been central to the field.

Theory of operation

[edit]

Using Morris' algorithm, the counter represents an "order of magnitude estimate" of the actual count. The approximation is mathematically unbiased.

To increment the counter, a pseudo-random event is used, such that the incrementing is a probabilistic event. To save space, only the exponent is kept. For example, in base 2, the counter can estimate the count to be 1, 2, 4, 8, 16, 32, and all of the powers of two. The memory requirement is simply to hold the exponent.

As an example, to increment from 4 to 8, a pseudo-random number would be generated such that the probability the counter is increased is 0.25. Otherwise, the counter remains at 4.

The table below illustrates some of the potential values of the counter:

Stored binary value of counter Approximation Range of possible values for the actual count Expectation (sufficiently large n, uniform distribution)
0 1 0, or initial value 0
1 2 1 or more 2
10 4 2 or more 6
11 8 3 or more 14
100 16 4 or more 30
101 32 5 or more 62

If the counter holds the value of 101, which equates to an exponent of 5 (the decimal equivalent of 101), then the estimated count is , or 32. There is a fairly low probability that the actual count of increment events was 5 (). The actual count of increment events is likely to be "around 32", but it could be arbitrarily high (with decreasing probabilities for actual counts above 39).

Selecting counter values

[edit]

While using powers of 2 as counter values is memory efficient, arbitrary values tend to create a dynamic error range, and the smaller values will have a greater error ratio than bigger values. Other methods of selecting counter values consider parameters such as memory availability, desired error ratio, or counting range to provide an optimal set of values.[2]

However, when several counters share the same values, values are optimized according to the counter with the largest counting range, and produce sub-optimal accuracy for smaller counters. Mitigation is achieved by maintaining Independent Counter Estimation buckets,[3] which restrict the effect of a larger counter to the other counters in the bucket.

Algorithm

[edit]

The algorithm can be implemented by hand. When incrementing the counter, flip a coin a number of times corresponding to the counter's current value. If it comes up heads each time, then increment the counter. Otherwise, do not increment it.

This can be easily achieved on a computer. Let be the current value of the counter. Generating pseudo-random bits and using the logical AND of all those bits and add the result to the counter. As the result was zero if any of those pseudo-random bits are zero, achieving an increment probability of . This procedure is executed each time the request is made to increment the counter.

Applications

[edit]

The algorithm is useful in examining large data streams for patterns. This is particularly useful in applications of data compression, sight and sound recognition, and other artificial intelligence applications.

See also

[edit]

References

[edit]
  1. ^ Nelson, Jelani; Yu, Huacheng (2020). "Optimal bounds for approximate counting". arXiv:2010.02116. {{cite journal}}: Cite journal requires |journal= (help)
  2. ^ Tsidon, Erez, Iddo Hanniel, and Isaac Keslassy. "Estimators also need shared values to grow together." INFOCOM, 2012 Proceedings IEEE. IEEE, 2012.
  3. ^ Einziger, G.; Fellman, B.; Kassner, Y. (April 2015). "Independent counter estimation buckets". 2015 IEEE Conference on Computer Communications (INFOCOM). pp. 2560–2568. doi:10.1109/INFOCOM.2015.7218646. ISBN 978-1-4799-8381-0. S2CID 15673730.

Sources

[edit]
  • Morris, R. Counting large numbers of events in small registers. Communications of the ACM 21, 10 (1978), 840–842
  • Flajolet, P. Approximate Counting: A Detailed Analysis. BIT 25, (1985), 113–134 [1]
  • Fouchs, M., Lee, C-K., Prodinger, H., Approximate Counting via the Poisson-Laplace-Mellin Method [2]