De Bruijn graph: Difference between revisions
→See also: remove two of dubious relevance |
Citation bot (talk | contribs) Added volume. | Use this bot. Report bugs. | #UCB_CommandLine |
||
(41 intermediate revisions by 22 users not shown) | |||
Line 1: | Line 1: | ||
{{short description|Directed graph representing overlaps between sequences of symbols}} |
|||
In [[graph theory]], an ''n''-dimensional '''De Bruijn graph''' of ''m'' symbols is a [[directed graph]] representing overlaps between sequences of symbols. It has ''m''<sup>''n''</sup> [[vertex (graph theory)|vertices]], consisting of all possible length-''n'' sequences of the given symbols; the same symbol may appear multiple times in a sequence. If we have the set of ''m'' symbols <math>S:=\{s_1,\dots,s_m\}</math> then the set of vertices is: |
|||
In [[graph theory]], an {{mvar|n}}-dimensional '''De Bruijn graph''' of {{mvar|m}} symbols is a [[directed graph]] representing overlaps between sequences of symbols. It has {{mvar|m{{sup|n}}}} [[vertex (graph theory)|vertices]], consisting of all possible {{nowrap|length-{{mvar|n}}}} sequences of the given symbols; the same symbol may appear multiple times in a sequence. For a [[Set (mathematics)|set]] of {{mvar|m}} symbols {{math|1=''S'' = {''s''{{sub|1}}, …, ''s''{{sub|''m''}}},}} the set of vertices is: |
|||
: <math>V=S^n=\{(s_1,\dots,s_1,s_1),(s_1,\dots,s_1,s_2),\dots,(s_1,\dots,s_1,s_m),(s_1,\dots,s_2,s_1),\dots,(s_m,\dots,s_m,s_m)\}.</math> |
|||
:<math>V=S^n=\{(s_1,\dots,s_1,s_1),(s_1,\dots,s_1,s_2),\dots,(s_1,\dots,s_1,s_m),(s_1,\dots,s_2,s_1),\dots,(s_m,\dots,s_m,s_m)\}.</math> |
|||
If one of the vertices can be expressed as another vertex by shifting all its symbols by one place to the left and adding a new symbol at the end of this vertex, then the latter has a directed edge to the former vertex. Thus the set of arcs ( |
If one of the vertices can be expressed as another vertex by shifting all its symbols by one place to the left and adding a new symbol at the end of this vertex, then the latter has a directed edge to the former vertex. Thus the set of arcs (that is, directed edges) is |
||
:<math>E=\{((t_1,t_2,\dots,t_n),(t_2,\dots,t_n,s_j)) : t_i \in S, 1\le i\le n, 1\le j\le m \}.</math> |
|||
Although De Bruijn graphs are named after [[Nicolaas Govert de Bruijn]], they were invented independently by both de Bruijn<ref name="Bruijn1946"/> and [[I. J. Good]].<ref name="Good1946"/> Much earlier, Camille Flye Sainte-Marie<ref name="Flye1894"/> implicitly used their properties. |
|||
: <math>E=\{((v_1,v_2,\dots,v_n),(v_2,\dots,v_n,s_i)) : i=1,\dots,m \}.</math> |
|||
Although De Bruijn graphs are named after [[Nicolaas Govert de Bruijn]], they were discovered independently by both De Bruijn<ref name="Bruijn1946"/> and [[I. J. Good]].<ref name="Good1946"/> Much earlier, Camille Flye Sainte-Marie<ref name="Flye1894"/> implicitly used their properties. |
|||
== Properties == |
== Properties == |
||
* If |
* If {{math|1=''n'' = 1}}, then the condition for any two vertices forming an edge holds [[vacuously]], and hence all the vertices are connected, forming a total of {{math|''m''{{sup|2}}}} edges. |
||
* Each vertex has exactly |
* Each vertex has exactly {{mvar|m}} incoming and {{mvar|m}} outgoing edges. |
||
* Each |
* Each {{mvar|n}}-dimensional De Bruijn graph is the [[line graph|line digraph]] of the {{nowrap|{{math|(''n'' – 1)}}-dimensional}} De Bruijn graph with the same set of symbols.<ref name="Zhang1987"/> |
||
* Each De Bruijn graph is [[Euler cycle|Eulerian]] and [[Hamiltonian graph|Hamiltonian]]. The Euler cycles and Hamiltonian cycles of these graphs (equivalent to each other via the line graph construction) are [[De Bruijn sequence]]s. |
* Each De Bruijn graph is [[Euler cycle|Eulerian]] and [[Hamiltonian graph|Hamiltonian]]. The Euler cycles and Hamiltonian cycles of these graphs (equivalent to each other via the line graph construction) are [[De Bruijn sequence]]s. |
||
The [[line graph]] construction of the three smallest binary De Bruijn graphs is depicted below. As can be seen in the illustration, each vertex of the |
The [[line graph]] construction of the three smallest binary De Bruijn graphs is depicted below. As can be seen in the illustration, each vertex of the {{mvar|n}}-dimensional De Bruijn graph corresponds to an edge of the {{nowrap|{{math|(''n'' – 1)}}-dimensional}} De Bruijn graph, and each edge in the {{mvar|n}}-dimensional De Bruijn graph corresponds to a two-edge path in the {{nowrap|{{math|(''n'' – 1)}}-dimensional}} De Bruijn graph. |
||
[[Image:DeBruijn-as-line-digraph.svg|center|600px]] |
|||
{{Wide image|DeBruijn-as-line-digraph.svg|600px|alt=Image showing multiple De Bruijn graphs, each the line graph of the previous one|align=center|border=no}} |
|||
== Dynamical systems == |
== Dynamical systems == |
||
Binary De Bruijn graphs can be drawn |
Binary De Bruijn graphs can be drawn in such a way that they resemble objects from the theory of [[dynamical system]]s, such as the [[Lorenz attractor]]: |
||
{{multiple image|total_width=600px|image1=DeBruijn-3-2.svg|caption1=Binary De Bruijn graph|image2=Lorenz attractor2.svg|caption2=Lorenz attractor|align=center}} |
|||
:::[[Image:DeBruijn-3-2.svg|360px]] [[Image:Lorenz attractor yb.svg|200px]] |
|||
This analogy can be made rigorous: the |
This analogy can be made rigorous: the {{mvar|n}}-dimensional {{mvar|m}}-symbol De Bruijn graph is a model of the [[Bernoulli map]] |
||
:<math>x\mapsto mx\ \bmod\ 1.</math> |
|||
The Bernoulli map (also called the {{math|[[2x mod 1 map|2''x'' mod 1]]}} map for {{math|1=''m'' = 2}}) is an [[ergodic]] dynamical system, which can be understood to be a single [[shift operator|shift]] of a [[p-adic|{{mvar|m}}-adic number]].<ref name="Leroux2002"/> The trajectories of this dynamical system correspond to walks in the De Bruijn graph, where the correspondence is given by mapping each real {{mvar|x}} in the interval {{math|[0,1)}} to the vertex corresponding to the first {{mvar|n}} digits in the [[Radix|base]]-{{mvar|m}} representation of {{mvar|x}}. Equivalently, walks in the De Bruijn graph correspond to trajectories in a one-sided [[subshift of finite type]]. |
|||
[[File:de_Bruijn_binary_graph.svg|thumb|upright=1.35|Directed graphs of two {{math|''B''(2,3)}} [[de Bruijn sequence]]s and a {{math|''B''(2,4)}} sequence. In {{math|''B''(2,3)}}, each vertex is visited once, whereas in {{math|''B''(2,4)}}, each edge is traversed once.]] |
|||
Embeddings resembling this one can be used to show that the binary De Bruijn graphs have [[queue number]] 2<ref name=HeathRosenberg/> and that they have [[book thickness]] at most 5.<ref name=Obrenic/> |
|||
:<math>x\mapsto mx\ \bmod\ 1</math> |
|||
The Bernoulli map (also called the [[2x mod 1 map]] for ''m'' = 2) is an [[ergodic]] dynamical system, which can be understood to be a single [[shift operator|shift]] of a [[p-adic|m-adic number]].<ref name="Leroux2002"/> The trajectories of this dynamical system correspond to walks in the De Bruijn graph, where the correspondence is given by mapping each real ''x'' in the interval [0,1) to the vertex corresponding to the first ''n'' digits in the [[Radix|base]]-''m'' representation of ''x''. Equivalently, walks in the De Bruijn graph correspond to trajectories in a one-sided [[subshift of finite type]]. |
|||
== Uses == |
== Uses == |
||
* Some [[grid network]] topologies are De Bruijn graphs. |
* Some [[grid network]] topologies are De Bruijn graphs. |
||
* The [[distributed hash table]] protocol [[Koorde]] uses a De Bruijn graph. |
* The [[distributed hash table]] protocol [[Koorde]] uses a De Bruijn graph. |
||
* In [[bioinformatics]], De Bruijn graphs are used for ''de novo'' assembly of |
* In [[bioinformatics]], De Bruijn graphs are used for [[De novo sequence assemblers|''de novo'' assembly]] of sequencing reads into a [[genome]].<ref name="Pevzner2001a"/><ref name="Pevzner2001b"/><ref name="zerbino2008"/><ref name="chikhi2014representation"/><ref name="ukmss-37901"/> |
||
== See also == |
== See also == |
||
* [[De Bruijn sequence]] |
|||
* [[De Bruijn torus]] |
* [[De Bruijn torus]] |
||
* [[Hamming graph]] |
* [[Hamming graph]] |
||
Line 38: | Line 42: | ||
== References == |
== References == |
||
{{reflist|refs= |
|||
<references> |
|||
<ref name="Bruijn1946"> |
|||
{{cite journal |
<ref name="Bruijn1946">{{cite journal |
||
| |
|last=de Bruijn |first=N. G. | authorlink = Nicolaas Govert de Bruijn |
||
| title = A combinatorial problem |
|||
| authorlink = Nicolaas Govert de Bruijn |
|||
| journal = [[Indagationes Mathematicae]] |
|||
| title = A Combinatorial Problem |
|||
| journal = Koninklijke Nederlandse Akademie v. Wetenschappen |
|||
| volume = 49 |
| volume = 49 |
||
| pages = 758–764 |
| pages = 758–764 |
||
| year = 1946 |
| year = 1946 |
||
</ref> |
| mr = 0018142}}</ref> |
||
<ref name="Flye1894"> |
<ref name="Flye1894"> |
||
{{cite journal |
{{cite journal |
||
| |
| last = Flye Sainte-Marie | first = C. |
||
| title = Question 48 |
| title = Question 48 |
||
| journal = L'Intermédiaire |
| journal = [[L'Intermédiaire des Mathématiciens]] |
||
| volume = 1 |
| volume = 1 |
||
| year = 1894 |
| year = 1894 |
||
| pages = 107–110}} |
| pages = 107–110}} |
||
</ref> |
</ref> |
||
<ref name="Good1946"> |
<ref name="Good1946"> |
||
{{cite journal |
{{cite journal |
||
| |
| last = Good | first = I. J. |
||
| authorlink = I. J. Good |
| authorlink = I. J. Good |
||
| title = Normal recurring decimals |
| title = Normal recurring decimals |
||
| journal = Journal of the London Mathematical Society |
| journal = [[Journal of the London Mathematical Society]] |
||
| volume = 21 |
| volume = 21 |
||
| issue = 3 |
| issue = 3 |
||
Line 70: | Line 75: | ||
| doi = 10.1112/jlms/s1-21.3.167}} |
| doi = 10.1112/jlms/s1-21.3.167}} |
||
</ref> |
</ref> |
||
<ref name= |
<ref name=HeathRosenberg> |
||
{{cite journal |
|||
{{citation |
|||
| last1 = Heath | first1 = Lenwood S. |
|||
| last2 = Rosenberg | first2 = Arnold L. | author2-link = Arnold L. Rosenberg |
|||
| year = 2002 |
|||
| doi = 10.1137/0221055 |
|||
| bibcode = 2002quant.ph..9100P |
|||
| issue = 5 |
|||
| title = Coassociative grammar, periodic orbits and quantum random walk over Z |
|||
| journal = [[SIAM Journal on Computing]] |
|||
| arxiv = quant-ph/0209100}} |
|||
| mr = 1181408 |
|||
| pages = 927–958 |
|||
| title = Laying out graphs using queues |
|||
| volume = 21 |
|||
| year = 1992}} |
|||
</ref> |
</ref> |
||
<ref name="Leroux2002">{{cite journal |
|||
| last = Leroux | first = Philippe |
|||
| arxiv = quant-ph/0209100 |
|||
| doi = 10.1155/IJMMS.2005.3979 |
|||
| issue = 24 |
|||
| journal = International Journal of Mathematics and Mathematical Sciences |
|||
| mr = 2204471 |
|||
| pages = 3979–3996 |
|||
| title = Coassociative grammar, periodic orbits, and quantum random walk over <math>\Z</math> |
|||
| year = 2005| volume = 2005 |
|||
| doi-access = free |
|||
}}</ref> |
|||
<ref name="Zhang1987"> |
<ref name="Zhang1987"> |
||
{{cite journal |
{{cite journal |
||
| |
|last1=Zhang | first1 = Fu Ji |
||
|last2=Lin | first2 = Guo Ning |
|||
| title = On the de Bruijn-Good graphs |
|||
| title = On the de Bruijn–Good graphs |
|||
| journal = Acta Math. Sinica |
|||
| journal = [[Acta Mathematica Sinica]] |
|||
| volume = 30 |
| volume = 30 |
||
| year = 1987 |
| year = 1987 |
||
Line 90: | Line 115: | ||
<ref name ="Pevzner2001a"> |
<ref name ="Pevzner2001a"> |
||
{{cite journal |
{{cite journal |
||
| |
|last1=Pevzner | first1 = Pavel A. | author1-link= Pavel A. Pevzner |
||
|last2=Tang|first2= Haixu |
|||
|last3=Waterman|first3= Michael S. | author3-link = Michael Waterman |
|||
| title = An Eulerian path approach to DNA fragment assembly |
| title = An Eulerian path approach to DNA fragment assembly |
||
| journal = [[Proceedings of the National Academy of Sciences of the United States of America|Proceedings of the National Academy of Sciences]] |
|||
| journal = PNAS |
|||
| volume = 98 |
| volume = 98 |
||
| issue = 17 |
| issue = 17 |
||
Line 100: | Line 127: | ||
| pmc = 55524 |
| pmc = 55524 |
||
| doi = 10.1073/pnas.171285098 |
| doi = 10.1073/pnas.171285098 |
||
| bibcode = 2001PNAS...98.9748P}} |
| bibcode = 2001PNAS...98.9748P| doi-access = free }} |
||
</ref> |
</ref> |
||
<ref name ="Pevzner2001b"> |
<ref name ="Pevzner2001b"> |
||
{{cite journal |
{{cite journal |
||
| |
|last1=Pevzner | first1 = Pavel A. | author1-link= Pavel A. Pevzner |
||
|last2=Tang|first2= Haixu |
|||
| title = Fragment Assembly with Double-Barreled Data |
| title = Fragment Assembly with Double-Barreled Data |
||
| journal = Bioinformatics |
| journal = [[Bioinformatics (journal)|Bioinformatics]] |
||
| volume = |
| volume = 17 |
||
| issue=Suppl 1 |
|||
| year = 2001 |
| year = 2001 |
||
| pages = |
| pages = S225–S233 |
||
| doi = 10.1093/bioinformatics/17.suppl_1.S225| pmid=11473013 |
|||
</ref> |
|||
| doi-access = free |
|||
}}</ref> |
|||
<ref name="zerbino2008"> |
<ref name="zerbino2008"> |
||
{{cite journal |
{{cite journal |
||
| |
|last1=Zerbino|first1= Daniel R. |last2=Birney|first2= Ewan | title = Velvet: algorithms for de novo short read assembly using de Bruijn graphs |
||
| journal = [[Genome Research]] |
|||
| title = Velvet: algorithms for de novo short read assembly using de Bruijn graphs |
|||
| journal = Genome Research |
|||
| volume = 18 |
| volume = 18 |
||
| issue = 5 |
| issue = 5 |
||
Line 124: | Line 155: | ||
| pmc = 2336801}} |
| pmc = 2336801}} |
||
</ref> |
</ref> |
||
<ref name="chikhi2014representation"> |
<ref name="chikhi2014representation"> |
||
{{cite journal |
{{cite journal |
||
| |
| last1 = Chikhi | first1 = R. |
||
| last2 = Limasset | first2 = A. |
|||
| last3 = Jackman | first3 = S. |
|||
| last4 = Simpson | first4 = J. |
|||
| last5= Medvedev | first5 = P. |
|||
| title = On the representation of de Bruijn graphs |
| title = On the representation of de Bruijn graphs |
||
| journal = |
| journal = [[Journal of Computational Biology]] |
||
| year = 2014 |
|||
| volume = 22 |
|||
| issue = 5 |
|||
| pages = 336–52 |
|||
| doi = 10.1089/cmb.2014.0160 |
|||
| pmid = 25629448 |
|||
| arxiv = 1401.5383 |
|||
| s2cid = 9502231 |
|||
}} |
|||
</ref> |
|||
<ref name=Obrenic> |
|||
{{cite journal |
|||
| last = Obrenić | first = Bojana |
|||
| doi = 10.1137/0406049 |
|||
| issue = 4 |
|||
| journal = [[SIAM Journal on Discrete Mathematics]] |
|||
| mr = 1241401 |
|||
| pages = 642–654 |
|||
| title = Embedding de Bruijn and shuffle-exchange graphs in five pages |
|||
| volume = 6 |
|||
| year = 1993}} |
|||
</ref> |
|||
<ref name="ukmss-37901"> |
|||
{{cite journal |
|||
|last1= Iqbal | first1 = Zamin |
|||
|last2= Caccamo | first2 = Mario |
|||
|last3= Turner | first3 = Isaac |
|||
|last4= Flicek | first4 = Paul |
|||
|last5=McVean | first5 = Gil |
|||
| title = De novo assembly and genotyping of variants using colored de Bruijn graphs |
|||
| journal = [[Nature Genetics]] |
|||
| volume = 44 |
|||
| issue = 2 |
|||
| year = 2012 |
|||
| pages = 226–32 |
|||
| doi = 10.1038/ng.1028 |
|||
| pmc = 3272472 |
|||
| pmid = 22231483}} |
|||
</ref> |
</ref> |
||
}} |
|||
</references> |
|||
==External links== |
==External links== |
||
* {{MathWorld|title=De Bruijn Graph|id=deBruijnGraph}} |
* {{MathWorld|title=De Bruijn Graph|id=deBruijnGraph}} |
||
* [http://www.homolog.us/Tutorials/index.php?p=2.1&s=1 Tutorial on using De Bruijn Graphs in Bioinformatics] by Homolog.us |
* [https://web.archive.org/web/20141030124239/http://www.homolog.us/Tutorials/index.php?p=2.1&s=1 Tutorial on using De Bruijn Graphs in Bioinformatics] by Homolog.us |
||
[[Category:Dynamical systems]] |
[[Category:Dynamical systems]] |
||
[[Category:Automata |
[[Category:Automata (computation)]] |
||
[[Category:Parametric families of graphs]] |
[[Category:Parametric families of graphs]] |
||
[[Category:Directed graphs]] |
[[Category:Directed graphs]] |
Latest revision as of 14:29, 26 April 2024
In graph theory, an n-dimensional De Bruijn graph of m symbols is a directed graph representing overlaps between sequences of symbols. It has mn vertices, consisting of all possible length-n sequences of the given symbols; the same symbol may appear multiple times in a sequence. For a set of m symbols S = {s1, …, sm}, the set of vertices is:
If one of the vertices can be expressed as another vertex by shifting all its symbols by one place to the left and adding a new symbol at the end of this vertex, then the latter has a directed edge to the former vertex. Thus the set of arcs (that is, directed edges) is
Although De Bruijn graphs are named after Nicolaas Govert de Bruijn, they were invented independently by both de Bruijn[1] and I. J. Good.[2] Much earlier, Camille Flye Sainte-Marie[3] implicitly used their properties.
Properties
[edit]- If n = 1, then the condition for any two vertices forming an edge holds vacuously, and hence all the vertices are connected, forming a total of m2 edges.
- Each vertex has exactly m incoming and m outgoing edges.
- Each n-dimensional De Bruijn graph is the line digraph of the (n – 1)-dimensional De Bruijn graph with the same set of symbols.[4]
- Each De Bruijn graph is Eulerian and Hamiltonian. The Euler cycles and Hamiltonian cycles of these graphs (equivalent to each other via the line graph construction) are De Bruijn sequences.
The line graph construction of the three smallest binary De Bruijn graphs is depicted below. As can be seen in the illustration, each vertex of the n-dimensional De Bruijn graph corresponds to an edge of the (n – 1)-dimensional De Bruijn graph, and each edge in the n-dimensional De Bruijn graph corresponds to a two-edge path in the (n – 1)-dimensional De Bruijn graph.
Dynamical systems
[edit]Binary De Bruijn graphs can be drawn in such a way that they resemble objects from the theory of dynamical systems, such as the Lorenz attractor:
This analogy can be made rigorous: the n-dimensional m-symbol De Bruijn graph is a model of the Bernoulli map
The Bernoulli map (also called the 2x mod 1 map for m = 2) is an ergodic dynamical system, which can be understood to be a single shift of a m-adic number.[5] The trajectories of this dynamical system correspond to walks in the De Bruijn graph, where the correspondence is given by mapping each real x in the interval [0,1) to the vertex corresponding to the first n digits in the base-m representation of x. Equivalently, walks in the De Bruijn graph correspond to trajectories in a one-sided subshift of finite type.
Embeddings resembling this one can be used to show that the binary De Bruijn graphs have queue number 2[6] and that they have book thickness at most 5.[7]
Uses
[edit]- Some grid network topologies are De Bruijn graphs.
- The distributed hash table protocol Koorde uses a De Bruijn graph.
- In bioinformatics, De Bruijn graphs are used for de novo assembly of sequencing reads into a genome.[8][9][10][11][12]
See also
[edit]References
[edit]- ^ de Bruijn, N. G. (1946). "A combinatorial problem". Indagationes Mathematicae. 49: 758–764. MR 0018142.
- ^ Good, I. J. (1946). "Normal recurring decimals". Journal of the London Mathematical Society. 21 (3): 167–169. doi:10.1112/jlms/s1-21.3.167.
- ^ Flye Sainte-Marie, C. (1894). "Question 48". L'Intermédiaire des Mathématiciens. 1: 107–110.
- ^ Zhang, Fu Ji; Lin, Guo Ning (1987). "On the de Bruijn–Good graphs". Acta Mathematica Sinica. 30 (2): 195–205.
- ^ Leroux, Philippe (2005). "Coassociative grammar, periodic orbits, and quantum random walk over ". International Journal of Mathematics and Mathematical Sciences. 2005 (24): 3979–3996. arXiv:quant-ph/0209100. doi:10.1155/IJMMS.2005.3979. MR 2204471.
- ^ Heath, Lenwood S.; Rosenberg, Arnold L. (1992). "Laying out graphs using queues". SIAM Journal on Computing. 21 (5): 927–958. doi:10.1137/0221055. MR 1181408.
- ^ Obrenić, Bojana (1993). "Embedding de Bruijn and shuffle-exchange graphs in five pages". SIAM Journal on Discrete Mathematics. 6 (4): 642–654. doi:10.1137/0406049. MR 1241401.
- ^ Pevzner, Pavel A.; Tang, Haixu; Waterman, Michael S. (2001). "An Eulerian path approach to DNA fragment assembly". Proceedings of the National Academy of Sciences. 98 (17): 9748–9753. Bibcode:2001PNAS...98.9748P. doi:10.1073/pnas.171285098. PMC 55524. PMID 11504945.
- ^ Pevzner, Pavel A.; Tang, Haixu (2001). "Fragment Assembly with Double-Barreled Data". Bioinformatics. 17 (Suppl 1): S225–S233. doi:10.1093/bioinformatics/17.suppl_1.S225. PMID 11473013.
- ^ Zerbino, Daniel R.; Birney, Ewan (2008). "Velvet: algorithms for de novo short read assembly using de Bruijn graphs". Genome Research. 18 (5): 821–829. doi:10.1101/gr.074492.107. PMC 2336801. PMID 18349386.
- ^ Chikhi, R.; Limasset, A.; Jackman, S.; Simpson, J.; Medvedev, P. (2014). "On the representation of de Bruijn graphs". Journal of Computational Biology. 22 (5): 336–52. arXiv:1401.5383. doi:10.1089/cmb.2014.0160. PMID 25629448. S2CID 9502231.
- ^ Iqbal, Zamin; Caccamo, Mario; Turner, Isaac; Flicek, Paul; McVean, Gil (2012). "De novo assembly and genotyping of variants using colored de Bruijn graphs". Nature Genetics. 44 (2): 226–32. doi:10.1038/ng.1028. PMC 3272472. PMID 22231483.