Jump to content

Hashlife: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
See also: removed overly generic link
Satopen (talk | contribs)
m Update URL of Life Lexicon
 
(15 intermediate revisions by 14 users not shown)
Line 1: Line 1:
{{short description|Algorithm for speeding up cellular automaton simulations}}
{{short description|Algorithm for speeding up cellular automaton simulations}}
{{No footnotes|date=September 2016}}
{{More footnotes|date=September 2016}}
[[File:Turing Machine in Golly.png|thumb|379px|right|The 6,366,548,773,467,669,985,195,496,000 (6 octillionth) generation of a [[Turing machine]] in [[Conway's Game of Life|Life]] computed in less than 30 seconds on an [[Intel Core]] Duo 2GHz CPU using Hashlife in [[Golly (program)|Golly]]. Computed by detecting a repeating cycle in the pattern, and skipping ahead to any requested generation.]]
[[File:Turing Machine in Golly.png|thumb|379px|right|The 6,366,548,773,467,669,985,195,496,000 (6 [[Names of large numbers|octillionth]]) generation of a [[Turing machine]] in [[Conway's Game of Life|Life]] computed in less than 30 seconds on an [[Intel Core]] Duo 2GHz CPU using Hashlife in [[Golly (program)|Golly]]. Computed by detecting a repeating cycle in the pattern, and skipping ahead to any requested generation.]]
'''Hashlife''' is a [[Memoization|memoized]] [[algorithm]] for computing the long-term fate of a given starting configuration in [[Conway's Game of Life]] and related [[cellular automaton|cellular automata]], much more quickly than would be possible using alternative algorithms that simulate each time step of each cell of the automaton. The algorithm was first described by [[Bill Gosper]] in the early 1980s while he was engaged in research at the [[Xerox Palo Alto Research Center]]. Hashlife was originally implemented on [[Symbolics]] [[Lisp machine]]s with the aid of the [[Flavors (programming language)|Flavors]] extension.
'''Hashlife''' is a [[Memoization|memoized]] [[algorithm]] for computing the long-term fate of a given starting configuration in [[Conway's Game of Life]] and related [[cellular automaton|cellular automata]], much more quickly than would be possible using alternative algorithms that simulate each time step of each cell of the automaton. The algorithm was first described by [[Bill Gosper]] in the early 1980s while he was engaged in research at the [[Xerox Palo Alto Research Center]]. Hashlife was originally implemented on [[Symbolics]] [[Lisp machine]]s with the aid of the [[Flavors (programming language)|Flavors]] extension.


==Hashlife==
==Hashlife==
Hashlife is designed to exploit large amounts of spatial and temporal [[redundancy (information theory)|redundancy]] in most Life rules. For example, in [[Conway's Game of Life|Conway's Life]], many seemingly random patterns end up as collections of simple [[still life (CA)|still lifes]] and [[oscillator (CA)|oscillators]].
Hashlife is designed to exploit large amounts of spatial and temporal [[redundancy (information theory)|redundancy]] in most Life rules. For example, in [[Conway's Game of Life|Conway's Life]], many seemingly random patterns end up as collections of simple [[still life (CA)|still lifes]] and [[oscillator (CA)|oscillators]]. Hashlife does however not depend on patterns remaining in the same position; it is more about exploiting that large patterns tend to have subpatterns that appear in several places, possibly at different times.


===Representation===
===Representation===
The field is typically treated as a theoretically [[infinity|infinite]] grid, with the [[pattern (CA)|pattern]] in question centered near the [[origin (mathematics)|origin]]. A [[quadtree]] is used to represent the field. Given a square of 2<sup>2''k''</sup> cells, 2<sup>''k''</sup> on a side, at the ''k''th level of the tree, the hash table stores the 2<sup>''k''-1</sup>-by-2<sup>''k''-1</sup> square of cells in the center, 2<sup>''k''-2</sup> generations in the future. For example, for a 4x4 square it stores the 2x2 center, one generation forward; and for an 8x8 square it stores the 4x4 center, two generations forward.
The field is typically treated as a theoretically [[infinity|infinite]] grid, with the [[pattern (CA)|pattern]] in question centered near the [[origin (mathematics)|origin]]. A [[quadtree]] (with [[hash consing|sharing]] of nodes) is used to represent the field. A node at the ''k''th level of the tree represents a square of 2<sup>2''k''</sup> cells, 2<sup>''k''</sup> on a side, by referencing the four ''k''–1 level nodes that represent the four <math>2^{k-1} \times 2^{k-1}</math> quadrants of that level ''k'' square. For example, a level 3 node represents an 8×8 square, which decomposes into four 4×4 squares. Explicit cell contents are only stored at level 0. The root node has to be at a high enough level that all live cells are found within the square it represents.

While a quadtree naively seems to require far more [[computational overhead|overhead]] than simpler representations (such as using a [[matrix (mathematics)|matrix]] of [[bit]]s), it allows for various optimizations. Since each cell is either live or dead, there are only two possibilities for a node at level 0, so if nodes are allowed to be shared between parents, there is never a need for having more than 2 level 0 nodes in total. Likewise the 4 cells of a 2×2 square can only exhibit <math> 2^4 = 16 </math> different combinations, so no more than that many level 1 nodes are needed either. Going to higher levels, the number of possible ''k''th level squares grows as <math> 2^{2^k \cdot 2^k} = 2^{2^{2k}} </math>, but the number of distinct ''k''th level squares occurring in any particular run is much lower, and very often the same square contents appears in several places. For maximal sharing of nodes in the quadtree (which is not so much a [[tree (graph theory)|tree]] as a [[directed acyclic graph]]), we only want to use one node to represent all squares with the same content.


===Hashing===
===Hashing===
A [[hash table]], or more generally any kind of [[associative array]], may be used to map square contents to an already existing node representing those contents, so that one through the technique of [[hash consing]] may avoid creating a duplicate node representing those contents. If this is applied consistently then it is sufficient to hash the four pointers to component nodes, as a bottom–up hashing of the square contents would always find those four nodes at the level below. It turns out several operations on higher level nodes can be carried out without explicitly producing the contents of those nodes, instead it suffices to work with pointers to nodes a fixed number of levels down.
While a quadtree trivially has far more [[computational overhead|overhead]] than other simpler representations (such as using a [[matrix (mathematics)|matrix]] of [[bit]]s), it allows for various optimizations. As the name suggests, the algorithm uses [[hash table]]s to store the nodes of the quadtree. Many subpatterns in the tree are usually identical to each other; for example the pattern being studied may contain many copies of the same [[spaceship (CA)|spaceship]], or even large swathes of empty space. These subpatterns will all [[hash function|hash]] to the same position in the hash table, and thus many copies of the same subpattern can be stored using the same hash table entry. In addition, these subpatterns only need to be evaluated once, not once per copy as in other Life algorithms.


===Caching and superspeed===
This itself leads to significant improvements in resource requirements; for example a generation of the various [[breeder (CA)|breeders]] and [[spacefiller (CA)|spacefillers]], which grow at [[polynomial]] speeds, can be evaluated in Hashlife using [[logarithmic growth|logarithmic]] space and time.
The quadtree can be augmented to in a node also [[memoization|cache]] the result of an update on the contents of that node. There is not enough information in a <math> 2^k \times 2^k </math> square to determine the next timestep contents on the whole of that square, but the contents of a <math> 2^{k+1} \times 2^{k+1} </math> square centered at the same point determine the next timestep contents of the <math> 2^k \times 2^k </math> square. This level ''k'' node for that next timestep is offset by <math>2^{k-1}</math> cells in both the horizontal and vertical directions, so even in the case of [[still life (cellular automaton)|still life]] it would likely not be among the level ''k'' nodes that combine into the <math> 2^{k+1} \times 2^{k+1} </math> square, but at level ''k''–1 the squares are again in the same positions and will be shared if unchanged.


Practically, computing the next timestep contents is a recursive operation that bottom–up populates the cache field of each level ''k'' node with a level ''k''–1 node representing the contents of the updated center <math> 2^{k-1} \times 2^{k-1} </math> square. Sharing of nodes can bring a significant speed-up to this operation, since the work required is proportional to the number of nodes, not to the number of cells as in a simpler representation. If nodes are being shared between quadtrees representing different timesteps, then only those nodes which were newly created during the previous timestep will need to have a cached value computed at all.
===Superspeed and caching===
A further speedup for many patterns can be achieved by evolving different nodes at different speeds. For example, one could compute twice the number of generations forward for a node at the (''k''+1)-th level compared to one at the ''k''th. For sparse or repetitive patterns such as the classical [[gun (CA)|glider gun]], this can result in tremendous speedups, allowing one to compute ''bigger'' patterns at ''higher'' generations ''faster'', sometimes [[exponential growth|exponentially]]. To take full advantage of this feature, subpatterns from past generations should be [[cache (computing)|saved]] as well.


'''Superspeed''' goes further, using the observation that the contents of a <math> 2^{k+1} \times 2^{k+1} </math> square actually determine the contents of its central <math> 2^k \times 2^k </math> square for the next <math> 2^{k-1} </math> timesteps. Instead of having a level ''k'' node cache a level ''k''–1 node for the contents 1 step ahead, we can have it cache one for the contents <math>2^{k-2}</math> steps ahead. Because updates at level ''k'' are computed from updates at level ''k''–1, and since at level ''k''–1 there are cached results for advancing <math>2^{k-3}</math> timesteps, a mere two rounds of advancing at level ''k''–1 suffice for advancing by <math>2^{k-2}</math> steps at level ''k''.
Since different patterns are allowed to run at different speeds, some implementations, like Gosper's own <code>hlife</code> program, do not have an interactive display, but simply compute a preset end result for a starting pattern, usually run from the [[command line]]. More recent programs such as [[Golly (program)|Golly]], however, have a graphical interface that can drive a Hashlife-based engine.

In the worst case 2 rounds at level ''k''–1 may have to do 4 full rounds at level ''k''–2, in turn calling for 8 full rounds at level ''k''–3, etc., but in practice many subpatterns in the tree are identical to each other and most branches of the recursion are short. For example the pattern being studied may contain many copies of the same [[spaceship (CA)|spaceship]], and often large swathes of empty space. Each instance of these subpatterns will [[hash function|hash]] to the same quadtree node, and thus only need to be stored once. In addition, these subpatterns only need to be evaluated once, not once per copy as in other Life algorithms.

For sparse or repetitive patterns such as the classical [[gun (CA)|glider gun]], this can result in tremendous speedups, allowing one to compute ''bigger'' patterns at ''higher'' generations ''faster'', sometimes [[exponential growth|exponentially]]. A generation of the various [[breeder (CA)|breeders]] and [[spacefiller (CA)|spacefillers]], which grow at [[polynomial]] speeds, can be evaluated in Hashlife using [[logarithmic growth|logarithmic]] space and time.

Since subpatterns of different sizes are effectively run at different speeds, some implementations, like Gosper's own hlife program, do not have an interactive display; they simply advance the starting pattern a given number of steps, and are usually run from the [[command line]]. More recent programs such as [[Golly (program)|Golly]], however, have a graphical interface that can drive a Hashlife-based engine.


The typical behavior of a Hashlife program on a conducive pattern is as follows: first the algorithm runs slower compared to other algorithms because of the constant overhead associated with [[hash function|hashing]] and building the [[quadtree|tree]]; but later, enough data will be gathered and its speed will increase tremendously – the rapid increase in speed is often described as "[[exponential growth|exploding]]".
The typical behavior of a Hashlife program on a conducive pattern is as follows: first the algorithm runs slower compared to other algorithms because of the constant overhead associated with [[hash function|hashing]] and building the [[quadtree|tree]]; but later, enough data will be gathered and its speed will increase tremendously – the rapid increase in speed is often described as "[[exponential growth|exploding]]".
Line 26: Line 34:


Hashlife is also significantly more complex to [[computer programming|implement]]. For example, it needs a dedicated [[garbage collection (computer science)|garbage collector]] to remove unused nodes from the cache.
Hashlife is also significantly more complex to [[computer programming|implement]]. For example, it needs a dedicated [[garbage collection (computer science)|garbage collector]] to remove unused nodes from the cache.

Due to being designed for processing generally predictable patterns, chaotic and explosive rules generally operate much more poorly under Hashlife than they would under other implementations.<ref>HashLife algorithm description in Golly: "Note that HashLife performs very poorly on highly chaotic patterns, so in those cases you are better off switching to QuickLife."</ref>


==See also==
==See also==
* [[Purely functional data structure]], of which the hashed quadtree is one
* [[Purely functional data structure]], of which the hashed quadtree is one
* [[Hash consing]], which was the key strategy used in the original implementation of Hashlife.


==References==
==References==
{{Reflist}}
* {{cite journal
* {{cite journal
| last=Gosper | first=Bill
| last=Gosper | first=Bill
Line 46: Line 58:
* [http://www.ericweisstein.com/encyclopedias/life/HashLife.html HashLife from Eric Weisstein's Treasure Trove of Life]
* [http://www.ericweisstein.com/encyclopedias/life/HashLife.html HashLife from Eric Weisstein's Treasure Trove of Life]
* [http://tomas.rokicki.com/hlife/ Tomas Rokicki's implementation of hashlife]
* [http://tomas.rokicki.com/hlife/ Tomas Rokicki's implementation of hashlife]
* [http://www.argentum.freeserve.co.uk/lex_h.htm#hashlife Entry] in the [https://web.archive.org/web/20160712171724/http://www.argentum.freeserve.co.uk/lex_home.htm Life Lexicon]
* [https://conwaylife.com/ref/lexicon/lex_h.htm#hashlife Entry] in the [https://conwaylife.com/ref/lexicon/lex_home.htm Life Lexicon]
* [http://www.ddj.com/dept/ai/184406478 Explanation of the algorithm] from ''[[Dr. Dobb's Journal]]''
* [http://www.ddj.com/dept/ai/184406478 Explanation of the algorithm] from ''[[Dr. Dobb's Journal]]''
* [https://jennyhasahat.github.io/hashlife.html Gosper's Algorithm (Hashlife) Explained]
* [https://web.archive.org/web/20220131050938/https://jennyhasahat.github.io/hashlife.html Gosper's Algorithm (Hashlife) Explained] Archived from the [https://jennyhasahat.github.io/hashlife.html original]


{{Conway's Game of Life}}
{{Conway's Game of Life}}

Latest revision as of 04:04, 7 May 2024

The 6,366,548,773,467,669,985,195,496,000 (6 octillionth) generation of a Turing machine in Life computed in less than 30 seconds on an Intel Core Duo 2GHz CPU using Hashlife in Golly. Computed by detecting a repeating cycle in the pattern, and skipping ahead to any requested generation.

Hashlife is a memoized algorithm for computing the long-term fate of a given starting configuration in Conway's Game of Life and related cellular automata, much more quickly than would be possible using alternative algorithms that simulate each time step of each cell of the automaton. The algorithm was first described by Bill Gosper in the early 1980s while he was engaged in research at the Xerox Palo Alto Research Center. Hashlife was originally implemented on Symbolics Lisp machines with the aid of the Flavors extension.

Hashlife

[edit]

Hashlife is designed to exploit large amounts of spatial and temporal redundancy in most Life rules. For example, in Conway's Life, many seemingly random patterns end up as collections of simple still lifes and oscillators. Hashlife does however not depend on patterns remaining in the same position; it is more about exploiting that large patterns tend to have subpatterns that appear in several places, possibly at different times.

Representation

[edit]

The field is typically treated as a theoretically infinite grid, with the pattern in question centered near the origin. A quadtree (with sharing of nodes) is used to represent the field. A node at the kth level of the tree represents a square of 22k cells, 2k on a side, by referencing the four k–1 level nodes that represent the four quadrants of that level k square. For example, a level 3 node represents an 8×8 square, which decomposes into four 4×4 squares. Explicit cell contents are only stored at level 0. The root node has to be at a high enough level that all live cells are found within the square it represents.

While a quadtree naively seems to require far more overhead than simpler representations (such as using a matrix of bits), it allows for various optimizations. Since each cell is either live or dead, there are only two possibilities for a node at level 0, so if nodes are allowed to be shared between parents, there is never a need for having more than 2 level 0 nodes in total. Likewise the 4 cells of a 2×2 square can only exhibit different combinations, so no more than that many level 1 nodes are needed either. Going to higher levels, the number of possible kth level squares grows as , but the number of distinct kth level squares occurring in any particular run is much lower, and very often the same square contents appears in several places. For maximal sharing of nodes in the quadtree (which is not so much a tree as a directed acyclic graph), we only want to use one node to represent all squares with the same content.

Hashing

[edit]

A hash table, or more generally any kind of associative array, may be used to map square contents to an already existing node representing those contents, so that one through the technique of hash consing may avoid creating a duplicate node representing those contents. If this is applied consistently then it is sufficient to hash the four pointers to component nodes, as a bottom–up hashing of the square contents would always find those four nodes at the level below. It turns out several operations on higher level nodes can be carried out without explicitly producing the contents of those nodes, instead it suffices to work with pointers to nodes a fixed number of levels down.

Caching and superspeed

[edit]

The quadtree can be augmented to in a node also cache the result of an update on the contents of that node. There is not enough information in a square to determine the next timestep contents on the whole of that square, but the contents of a square centered at the same point determine the next timestep contents of the square. This level k node for that next timestep is offset by cells in both the horizontal and vertical directions, so even in the case of still life it would likely not be among the level k nodes that combine into the square, but at level k–1 the squares are again in the same positions and will be shared if unchanged.

Practically, computing the next timestep contents is a recursive operation that bottom–up populates the cache field of each level k node with a level k–1 node representing the contents of the updated center square. Sharing of nodes can bring a significant speed-up to this operation, since the work required is proportional to the number of nodes, not to the number of cells as in a simpler representation. If nodes are being shared between quadtrees representing different timesteps, then only those nodes which were newly created during the previous timestep will need to have a cached value computed at all.

Superspeed goes further, using the observation that the contents of a square actually determine the contents of its central square for the next timesteps. Instead of having a level k node cache a level k–1 node for the contents 1 step ahead, we can have it cache one for the contents steps ahead. Because updates at level k are computed from updates at level k–1, and since at level k–1 there are cached results for advancing timesteps, a mere two rounds of advancing at level k–1 suffice for advancing by steps at level k.

In the worst case 2 rounds at level k–1 may have to do 4 full rounds at level k–2, in turn calling for 8 full rounds at level k–3, etc., but in practice many subpatterns in the tree are identical to each other and most branches of the recursion are short. For example the pattern being studied may contain many copies of the same spaceship, and often large swathes of empty space. Each instance of these subpatterns will hash to the same quadtree node, and thus only need to be stored once. In addition, these subpatterns only need to be evaluated once, not once per copy as in other Life algorithms.

For sparse or repetitive patterns such as the classical glider gun, this can result in tremendous speedups, allowing one to compute bigger patterns at higher generations faster, sometimes exponentially. A generation of the various breeders and spacefillers, which grow at polynomial speeds, can be evaluated in Hashlife using logarithmic space and time.

Since subpatterns of different sizes are effectively run at different speeds, some implementations, like Gosper's own hlife program, do not have an interactive display; they simply advance the starting pattern a given number of steps, and are usually run from the command line. More recent programs such as Golly, however, have a graphical interface that can drive a Hashlife-based engine.

The typical behavior of a Hashlife program on a conducive pattern is as follows: first the algorithm runs slower compared to other algorithms because of the constant overhead associated with hashing and building the tree; but later, enough data will be gathered and its speed will increase tremendously – the rapid increase in speed is often described as "exploding".

Drawbacks

[edit]

Like many memoized codes, Hashlife can consume significantly more memory than other algorithms, especially on moderate-sized patterns with a lot of entropy, or which contain subpatterns poorly aligned to the bounds of the quadtree nodes (i.e. power-of-two sizes); the cache is a vulnerable component. It can also consume more time than other algorithms on these patterns. Golly, among other Life simulators, has options for toggling between Hashlife and conventional algorithms.

Hashlife is also significantly more complex to implement. For example, it needs a dedicated garbage collector to remove unused nodes from the cache.

Due to being designed for processing generally predictable patterns, chaotic and explosive rules generally operate much more poorly under Hashlife than they would under other implementations.[1]

See also

[edit]

References

[edit]
  1. ^ HashLife algorithm description in Golly: "Note that HashLife performs very poorly on highly chaotic patterns, so in those cases you are better off switching to QuickLife."
  • Gosper, Bill (1984). "Exploiting Regularities in Large Cellular Spaces". Physica D. 10 (1–2). Elsevier: 75–80. doi:10.1016/0167-2789(84)90251-3.
[edit]