Jump to content

User:Citing/sandbox3: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
subsections
In humans: notes
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Description ==
== Measures ==
Population structure is a complex phenomenon and no single measure captures it entirely. Understanding a population's structure requires a combination of methods and measures.<ref name="Lawsonvan Dorp2018"/><ref name="MeirmansHedrick2010"/>
The basic cause of population structure in [[sexually reproducing]] species is [[random mating|non-random mating]] between groups: if all individuals within a population mate randomly, then the [[allele frequencies]] should be similar between groups. Population structure commonly arises from physical separation by distance or barriers, like mountains and rivers, followed by [[genetic drift]]. Other causes include [[gene flow]] from migrations, [[population bottleneck]]s and expansions, [[founder effect]]s, [[evolutionary pressure]], random chance, and (in humans) cultural factors. Even in lieu of these factors, individuals tend to stay close to where they were born, which means that alleles will not be distributed at random with respect to the full range of the species.<ref>{{cite journal | vauthors = Cardon LR, Palmer LJ | title = Population stratification and spurious allelic association | journal = Lancet | volume = 361 | issue = 9357 | pages = 598–604 | date = February 2003 | pmid = 12598158 | doi = 10.1016/S0140-6736(03)12520-2 | s2cid = 14255234 }}</ref><ref>{{cite web | vauthors = McVean G | author-link1 = Gil McVean | url = http://www.stats.ox.ac.uk/~mcvean/notes7.pdf | archive-url = https://web.archive.org/web/20181123141715/http://www.stats.ox.ac.uk/~mcvean/notes7.pdf | archive-date = 2018-11-23| title = Population Structure | year = 2001 | access-date = 2020-11-14}}</ref> There are many methods to measure and capture population structure.


=== Heterozygosity measures ===
=== Heterozygosity ===
[[File:Loss of heterozygosity over time in a bottlenecking population with label.png|right|thumb|400px|A [[population bottleneck]] can result in a loss of heterozygosity. In this hypothetical population, an allele has become fixed after the population repeatedly dropped from 10 to 3.]]
One of the results of population structure is a reduction in [[heterozygosity]] compared to a population mating totally at random. When populations split, alleles have a higher chance of reaching [[Fixation (population genetics)|fixation]] within subpopulations, especially if the subpopulations are small or have been isolated for long periods of time. This reduction in heterozygosity can be thought of as an extension of measures of [[inbreeding]] or overlapping sets of pedigrees, with individuals in subpopulations being more likely to share a [[recent common ancestor]].<ref name="HartlClark">{{Cite book|last1=Hartl|first1=Daniel L.|last2=Clark|first2=Andrew G.|url=https://www.worldcat.org/oclc/37481398|title=Principles of population genetics|date=1997|publisher=Sinauer Associates|isbn=0-87893-306-9|edition=3rd|location=Sunderland, MA|oclc=37481398|pages=111-163}}</ref> The scale of allele frequencies is important — an individual with both parents born in the United Kingdom is not inbred relative to that country's population, but is more inbred than two humans selected from the entire world. This motivates the derivation of Wright's [[F-statistics|''F''-statistics]] (also called "fixation indices"), which measure inbreeding through observed versus expected heterozygosity. For example, <math>F_{IS}</math> measures the inbreeding coefficient at a single locus for an individual <math>I</math> relative to some subpopulation <math>S</math>:<ref name="Coop2019">{{cite book | title = Population and Quantitative Genetics | last = Coop | first = Graham | year = 2019 | pages=22-44}}</ref>
One of the results of population structure is a reduction in [[heterozygosity]]. When populations split, alleles have a higher chance of reaching [[Fixation (population genetics)|fixation]] within subpopulations, especially if the subpopulations are small or have been isolated for long periods. This reduction in heterozygosity can be thought of as an extension of [[inbreeding]], with individuals in subpopulations being more likely to share a [[recent common ancestor]].<ref name="HartlClark">{{Cite book|last1=Hartl|first1=Daniel L.|last2=Clark|first2=Andrew G.|url=https://www.worldcat.org/oclc/37481398|title=Principles of population genetics|date=1997|publisher=Sinauer Associates|isbn=0-87893-306-9|edition=3rd|location=Sunderland, MA|oclc=37481398|pages=111-163}}</ref> The scale is important — an individual with both parents born in the United Kingdom is not inbred relative to that country's population, but is more inbred than two humans selected from the entire world. This motivates the derivation of Wright's [[F-statistics|''F''-statistics]] (also called "fixation indices"), which measure inbreeding through observed versus expected heterozygosity.<ref name="Wright1949">{{cite journal|last1=Wright|first1=Sewall|title=THE GENETICAL STRUCTURE OF POPULATIONS|journal=Annals of Eugenics|volume=15|issue=1|year=1949|pages=323–354|issn=20501420|doi=10.1111/j.1469-1809.1949.tb02451.x}}</ref> For example, <math>F_{IS}</math> measures the inbreeding coefficient at a single locus for an individual <math>I</math> relative to some subpopulation <math>S</math>:<ref name="Coop2019">{{cite book | title = Population and Quantitative Genetics | last = Coop | first = Graham | year = 2019 | pages=22-44}}</ref>


<math>F_{IS} = 1 - \frac{H_I}{H_S}</math>
<math>F_{IS} = 1 - \frac{H_I}{H_S}</math>
Line 11: Line 12:
<math>F_{IS} = 1 - \frac{H_I}{2 p_S q_S}</math>
<math>F_{IS} = 1 - \frac{H_I}{2 p_S q_S}</math>


Similarly, for the total population <math>T</math>, we can define <math>H_T = 2 p_T q_T</math> allowing us to compare the expected heterozygosity of subpopulation <math>S</math> and the value <math>F_{ST}</math> as:<ref name = "Coop2019"/>
Similarly, for the total population <math>T</math>, we can define <math>H_T = 2 p_T q_T</math> allowing us to compute the expected heterozygosity of subpopulation <math>S</math> and the value <math>F_{ST}</math> as:<ref name = "Coop2019"/>




<math>F_{ST} = 1 - \frac{H_S}{H_T} = 1 - \frac{2p_S q_S}{2 p_T q_T}</math>
<math>F_{ST} = 1 - \frac{H_S}{H_T} = 1 - \frac{2p_S q_S}{2 p_T q_T}</math>


If ''F'' is 0, then the allele frequencies between populations are the same, suggesting no structure. The theoretical maximum value is 1, but most observed maximum values are far lower.<ref name="HartlClark"/> ''F<sub>ST</sub>'' is one of the most common measures of population structure and there are several different formulations depending on the number of populations and the alleles of interest. Although it is sometimes used as a [[genetic distance]] between populations, it does not always satisfy the [[triangle inequality]] and thus is not a [[Metric (mathematics)|metric]].<ref name="ArbisserRosenberg2020">{{cite journal|last1=Arbisser|first1=Ilana M.|last2=Rosenberg|first2=Noah A.|title=FST and the triangle inequality for biallelic markers|journal=Theoretical Population Biology|volume=133|year=2020|pages=117–129|issn=00405809|doi=10.1016/j.tpb.2019.05.003}}</ref>
If ''F'' is 0, then the allele frequencies between populations are identical, suggesting no structure. The theoretical maximum value of 1 is attained when an allele reaches total fixation, but most observed maximum values are far lower.<ref name="HartlClark"/> ''F<sub>ST</sub>'' is one of the most common measures of population structure and there are several different formulations depending on the number of populations and the alleles of interest. Although it is sometimes used as a [[genetic distance]] between populations, it does not always satisfy the [[triangle inequality]] and thus is not a [[Metric (mathematics)|metric]].<ref name="ArbisserRosenberg2020">{{cite journal|last1=Arbisser|first1=Ilana M.|last2=Rosenberg|first2=Noah A.|title=FST and the triangle inequality for biallelic markers|journal=Theoretical Population Biology|volume=133|year=2020|pages=117–129|issn=00405809|doi=10.1016/j.tpb.2019.05.003}}</ref> It also depends on within-population diversity, which makes interpretation and comparison difficult.<ref name="MeirmansHedrick2010">{{cite journal|last1=Meirmans|first1=Patrick G.|last2=Hedrick|first2=Philip W.|title=Assessing population structure:FST and related measures|journal=Molecular Ecology Resources|volume=11|issue=1|year=2010|pages=5–18|issn=1755-098X|doi=10.1111/j.1755-0998.2010.02927.x}}</ref>


=== Admixture proportions ===
=== Admixture inference ===
An individual's genotype can be modelled as an [[genetic admixture|admixture]] between ''K'' discrete clusters of populations.<ref name="Coop2019"/> Each cluster is defined by the frequencies of its genotypes, and the contribution of a cluster to an individual's genotypes is measured via an [[estimator]]. In 2000, [[Jonathan K. Pritchard]] introduced the STRUCTURE algorithm to estimate these proportions via [[Markov chain Monte Carlo]].<ref name="PritchardStephens2000">{{cite journal|last1=Pritchard|first1=Jonathan K|last2=Stephens|first2=Matthew|last3=Donnelly|first3=Peter|title=Inference of Population Structure Using Multilocus Genotype Data|journal=Genetics|volume=155|issue=2|year=2000|pages=945–959|issn=1943-2631|doi=10.1093/genetics/155.2.945|doi-access=free}}</ref> Since then, algorithms (such as ADMIXTURE) have been developed using other estimation techniques.<ref name="AlexanderNovembre2009">{{cite journal|last1=Alexander|first1=D. H.|last2=Novembre|first2=J.|last3=Lange|first3=K.|title=Fast model-based estimation of ancestry in unrelated individuals|journal=Genome Research|volume=19|issue=9|year=2009|pages=1655–1664|issn=1088-9051|doi=10.1101/gr.094052.109|pmc=2752134}}</ref><ref name="NovembreRamachandran2011">{{cite journal|last1=Novembre|first1=John|last2=Ramachandran|first2=Sohini|title=Perspectives on Human Population Structure at the Cusp of the Sequencing Era|journal=Annual Review of Genomics and Human Genetics|volume=12|issue=1|year=2011|pages=245–274|issn=1527-8204|doi=10.1146/annurev-genom-090810-183123}}</ref> Estimated proportions can be visualized using bar plots — each bar represents an individual, and is subdivided to represent the proportion of an individual's genetic ancestry from one of the ''K'' populations.<ref name="Coop2019"/>


{{wide image|File:Map_of_samples_and_population_structure_of_North_Africa_and_neighboring_populations.png|800px|caption=A study of population structure of humans in Northern Africa and neighboring populations modelled using ADMIXTURE and assuming K=2,4,6,8 populations (Figure B, top to bottom). Varying ''K'' changes the scale of clustering. At ''K''=2, 80% of the inferred ancestry for most North Africans is assigned to cluster that is common to Basque, Tuscan, and Qatari Arab individuals (in purple). At ''K''=4, clines of North African ancestry appear (in light blue). At ''K''=6, opposite clines of Near Eastern (Qatari) ancestry appear (in green). At ''K''=8, Tunisian Berbers appear as a cluster (in dark blue).<ref name="Henn2012">{{cite journal |vauthors=Henn BM, Botigué LR, Gravel S, Wang W, Brisbin A, Byrnes JK, Fadhlaoui-Zid K, Zalloua PA, Moreno-Estrada A, Bertranpetit J, Bustamante CD, Comas D |title=Genomic ancestry of North Africans supports back-to-Africa migrations |journal=PLoS Genet |volume=8 |issue=1 |pages=e1002397 |date=January 2012 |pmid=22253600 |pmc=3257290 |doi=10.1371/journal.pgen.1002397 |url=}}</ref>}}
=== Principal component analysis ===


Varying ''K'' can illustrate different scales of population structure; using a small ''K'' for the entire human population will subdivide people roughly by continent, while using large ''K'' will partition populations into finer subgroups.<ref name="Coop2019"/> Though clustering methods are popular, they are open to misinterpretation: for non-simulated data, there is never a true value of ''K'', but rather an approximation considered useful for a given question.<ref name="Lawsonvan Dorp2018">{{cite journal|last1=Lawson|first1=Daniel J.|last2=van Dorp|first2=Lucy|last3=Falush|first3=Daniel|title=A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots|journal=Nature Communications|volume=9|issue=1|year=2018|issn=2041-1723|doi=10.1038/s41467-018-05257-7|pmc=6092366}}</ref> They are sensitive to sampling strategies, sample size, and close relatives in data sets; there may be no discrete populations at all; and there may be hierarchical structure where subpopulations are nested.<ref name="Lawsonvan Dorp2018"/> Clusters may be admixed themselves,<ref name="Coop2019"/> and may not have a useful interpretation as source populations.<ref name="Novembre2016">{{cite journal|last1=Novembre|first1=John|title=Pritchard, Stephens, and Donnelly on Population Structure|journal=Genetics|volume=204|issue=2|year=2016|pages=391–393|issn=1943-2631|doi=10.1534/genetics.116.195164}}</ref>
=== HWE ===


=== Dimensionality reduction ===
Under [[Hardy-Weinberg equilibrium]], a population that mates completely at random with respect to genotype may have the frequencies of its genotypes derived. If there two alleles, <math>A_1</math> and <math>A_2</math>, that occur at frequencies <math>p</math> and <math>q</math> where <math>p+q=1</math>, then the frequencies of the genotypes will be:
[[File:Procrustes-transformed PCA plot of genetic variation of Sub-Saharan African populations.png|right|400px|thumb|A map of the locations of genetic samples of several African populations (left) and principal components 1 and 2 of the data superimposed on the map (right). The principal coordinate plane has been rotated 16.11° to align with the map. It corresponds to the east-west and north-south distributions of the populations.<ref name="Wang2012">{{cite journal |vauthors=Wang C, Zöllner S, Rosenberg NA |title=A quantitative comparison of the similarity between genes and geography in worldwide human populations |journal=PLoS Genet |volume=8 |issue=8 |pages=e1002886 |date=August 2012 |pmid=22927824 |pmc=3426559 |doi=10.1371/journal.pgen.1002886 |url=}}</ref>]]


Genetic data are [[high-dimensional statistics|high dimensional]] and [[dimensionality reduction]] techniques can capture population structure. [[Principal component analysis]] (PCA) was first applied in population genetics in 1978 by [[Cavalli-Sforza]] and colleagues and resurged with [[high-throughput sequencing]].<ref name="Coop2019"/><ref name="MenozziPiazza1978">{{cite journal|last1=Menozzi|first1=P|last2=Piazza|first2=A|last3=Cavalli-Sforza|first3=L|title=Synthetic maps of human gene frequencies in Europeans|journal=Science|volume=201|issue=4358|year=1978|pages=786–792|issn=0036-8075|doi=10.1126/science.356262}}</ref>
{| class="wikitable"

|-
Initially PCA was used on allele frequencies at known [[genetic markers]] for populations, though later it was found that by coding SNPs as integers (for example, as the number of [[reference genome|non-reference alleles]]) and normalizing the values, PCA could be applied at the level of individuals.<ref name="NovembreRamachandran2011"/> One formulation considers <math>N</math> individuals and <math>S</math> bi-allelic SNPs. For each individual <math>i</math>, the value at locus <math>l</math> is <math>g_{i,l}</math> is the number of non-reference alleles (one of <math>0, 1, 2</math>). If the allele frequency at <math>l</math> is <math>p_{l}</math>, then the resulting <math>N \times S</math> matrix of normalized genotypes has entries:<ref name="Coop2019"/>
! Genotype

| <math>f_{11} = A_1 A_1</math>
| <math>f_{22} = A_2 A_2</math>
<math>\frac{g_{i,l} - 2p_{l}}{\sqrt{2p_{l} (1-p_{l})}}</math>

| <math>f_{12} = A_1 A_2</math>
PCA transforms data to maximize variance; given enough data, when each individual is visualized as point on a plot, discrete clusters can form.<ref name="NovembreRamachandran2011"/> Individuals with admixed ancestries will tend to fall between clusters, and when there is homogenous [[isolation by distance]] in the data, the top PC vectors will reflect geographic variation.<ref name="NovembreRamachandran2011"/><ref name="NovembreJohnson2008">{{cite journal |vauthors=Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, Stephens M, Bustamante CD |title=Genes mirror geography within Europe |journal=Nature |volume=456 |issue=7218 |pages=98–101 |date=November 2008 |pmid=18758442 |pmc=2735096 |doi=10.1038/nature07331 |url=}}</ref> The [[eigenvector]]s generated by PCA can be explicitly written in terms of the mean [[coalescent theory|coalescent times]] for pairs of individuals, making PCA useful for interpretting population histories of groups in a given sample. PCA cannot, however, distinguish between different processes that lead to the same mean coalescent times.<ref name="McVean2009">{{cite journal |vauthors=McVean G |title=A genealogical interpretation of principal components analysis |journal=PLoS Genet |volume=5 |issue=10 |pages=e1000686 |date=October 2009 |pmid=19834557 |pmc=2757795 |doi=10.1371/journal.pgen.1000686 |url=}}</ref>
|-

! Frequency
[[Multidimensional scaling]] and [[discriminant analysis]] have been used to study differentiation, population assignment, and to analyze genetic distances.<ref name="jombart2009">{{cite journal |vauthors=Jombart T, Pontier D, Dufour AB |title=Genetic markers in the playground of multivariate analysis |journal=Heredity (Edinb) |volume=102 |issue=4 |pages=330–41 |date=April 2009 |pmid=19156164 |doi=10.1038/hdy.2008.130 |url=}}</ref> [[Neighbourhood (graph theory)|Neighborhood graph]] approaches like [[t-SNE|t-distributed stochastic neighbor embedding]] (t-SNE) and [[uniform manifold approximation and projection]] (UMAP) can visualize continental and subcontinental structure in human data.<ref name="LiCerise2017">{{cite journal |vauthors=Li W, Cerise JE, Yang Y, Han H |title=Application of t-SNE to human genetic data |journal=J Bioinform Comput Biol |volume=15 |issue=4 |pages=1750017 |date=August 2017 |pmid=28718343 |doi=10.1142/S0219720017500172 |url=}}</ref><ref name="diazpapkovich2019">{{cite journal |vauthors=Diaz-Papkovich A, Anderson-Trocmé L, Ben-Eghan C, Gravel S |title=UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts |journal=PLoS Genet |volume=15 |issue=11 |pages=e1008432 |date=November 2019 |pmid=31675358 |pmc=6853336 |doi=10.1371/journal.pgen.1008432 }}</ref> With larger datasets, UMAP better captures multiple scales of population structure — fine-scale patterns are hidden or split with other methods, and these are of interest when there are many diverse or admixed populations, or when examining relationships between genotypes, phenotypes, and/or geography.<ref name="diazpapkovich2019"/><ref name="sakaue2020">{{cite journal |vauthors=Sakaue S, Hirata J, Kanai M, Suzuki K, Akiyama M, Lai Too C, Arayssi T, Hammoudeh M, Al Emadi S, Masri BK, Halabi H, Badsha H, Uthman IW, Saxena R, Padyukov L, Hirata M, Matsuda K, Murakami Y, Kamatani Y, Okada Y |title=Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction |journal=Nat Commun |volume=11 |issue=1 |pages=1569 |date=March 2020 |pmid=32218440 |doi=10.1038/s41467-020-15194-z}}</ref> [[Variational autoencoder]]s can generate artificial genotypes with structure representative of the input data.<ref name="Battey2021">{{cite journal |vauthors=Battey CJ, Coffing GC, Kern AD |title=Visualizing population structure with variational autoencoders |journal=G3 (Bethesda) |volume=11 |issue=1 |pages= |date=January 2021 |pmid=33561250 |pmc=8022710 |doi=10.1093/g3journal/jkaa036 |url=}}</ref>
| style="text-align:center; | <math>p^2</math>

| style="text-align:center; | <math>q^2</math>
== In humans ==
| style="text-align:center; | <math>2pq</math>
* Analysis of structure can re-construct the histories of populations
|}
* History has been shaped by migrations, population bottlenecks, admixture. Models that re-create the structure from such events are useful.
* Commercial testing and genetic ancestry?
<ref name="wang2020">{{cite journal |vauthors=Wang K, Mathieson I, O'Connell J, Schiffels S |title=Tracking human population structure through time from whole genome sequences |journal=PLoS Genet |volume=16 |issue=3 |pages=e1008552 |date=March 2020 |pmid=32150539 |pmc=7082067 |doi=10.1371/journal.pgen.1008552 |url=}}</ref>
<ref name="skiglund2017">{{cite journal |vauthors=Skoglund P, Thompson JC, Prendergast ME, Mittnik A, Sirak K, Hajdinjak M, Salie T, Rohland N, Mallick S, Peltzer A, Heinze A, Olalde I, Ferry M, Harney E, Michel M, Stewardson K, Cerezo-Román JI, Chiumia C, Crowther A, Gomani-Chindebvu E, Gidna AO, Grillo KM, Helenius IT, Hellenthal G, Helm R, Horton M, López S, Mabulla AZP, Parkington J, Shipton C, Thomas MG, Tibesasa R, Welling M, Hayes VM, Kennett DJ, Ramesar R, Meyer M, Pääbo S, Patterson N, Morris AG, Boivin N, Pinhasi R, Krause J, Reich D |title=Reconstructing Prehistoric African Population Structure |journal=Cell |volume=171 |issue=1 |pages=59–71.e21 |date=September 2017 |pmid=28938123 |pmc=5679310 |doi=10.1016/j.cell.2017.08.049 |url=}}



* Ancient stuff
* Medical
* Population histories
* Descriptive
* Genetic ancestry

=== Population history ===
==== Ancient/archaic ====
* [https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008204 Models of archaic admixture and recent history from two-locus statistics Archaic refs 43-48]
* [https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007349 Outstanding questions in the study of archaic hominin admixture]
* [https://doi.org/10.1016/j.cell.2017.08.049 Reconstructing Prehistoric African Population Structure]
* [https://doi.org/10.1093/molbev/mss117 Ancient Structure in Africa Unlikely to Explain Neanderthal and Non-African Genetic Similarity]
* [https://www.pnas.org/content/109/35/13956.short Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins]
* [https://doi.org/10.1038/s41437-​021-00414 The IICR and the non-stationary structured coalescent: towards demographic inference with arbitrary changes in population structure]
* [https://doi.org/10.1093/gbe/evx018 Distinguishing Recent Admixture from Ancestral Population Structure]
* [https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008552 Tracking human population structure through time from whole genome sequences]

* [https://www.sciencedirect.com/science/article/pii/S009286742030619X Population Structure, Stratification, and Introgression of Human Structural Variation]

* [https://doi.org/10.1126/science.aay5012 Insights into human genetic variation and population history from 929 diverse genomes]
* [https://doi.org/10.1016/j.cell.2019.02.035 Multiple Deeply Divergent Denisovan Ancestries in Papuans]

* [https://www.sciencedirect.com/science/article/pii/S0092867418301752 Analysis of Human Sequence Data Reveals Two Pulses of Archaic Denisovan Admixture]
* [https://doi.org/10.1038/s41586-021-03244-5 Origins of modern human ancestry]
* [https://advances.sciencemag.org/content/6/8/eaay5483 Ref 27 on inferred N and structure]

==== Not explicitly ancient/archaic ====
* [https://advances.sciencemag.org/content/5/9/eaaw3492/ Population structure of modern-day Italians reveals patterns of ancient and archaic ancestries in Southern Europe]
* [https://www.sciencedirect.com/science/article/pii/S0002929720302007 Genetic Consequences of the Transatlantic Slave Trade in the Americas]
* [https://doi.org/10.1038/ncomms14238 Clustering of 770,000 genomes reveals post-colonial population structure of North America]
* [https://doi.org/10.1038/s41598-018-29851-3 Exploring Cuba’s population structure and demographic history using genome-wide data]
* [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196325 Population structure in Argentina]
* [https://www.sciencedirect.com/science/article/pii/S0092867421003652 Toward a fine-scale population health monitoring system]
* [https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008624 What is ancestry?]

=== Genetic epidemiology ===
Population structure can be a problem for [[genome wide association study|association studies]], such as [[case-control studies]], where the association between the trait of interest and [[genetic locus|locus]] could be incorrect. As an example, in a study population of Europeans and East Asians, an association study of [[chopstick]] usage may "discover" a gene in the Asian individuals that leads to chopstick use. However, this is a [[correlation does not imply causation|spurious relationship]] as the genetic variant is simply more common in Asians than in Europeans.<ref>{{cite journal | vauthors = Hamer D, Sirota L | title = Beware the chopsticks gene | journal = Molecular Psychiatry | volume = 5 | issue = 1 | pages = 11–3 | date = January 2000 | pmid = 10673763 | doi = 10.1038/sj.mp.4000662 }}</ref> Also, actual genetic findings may be overlooked if the locus is less prevalent in the population where the case subjects are chosen. For this reason, it was common in the 1990s to use family-based data where the effect of population structure can easily be controlled for using methods such as the [[transmission disequilibrium test]] (TDT).<ref>{{cite journal | vauthors = Pritchard JK, Rosenberg NA | title = Use of unlinked genetic markers to detect population stratification in association studies | journal = American Journal of Human Genetics | volume = 65 | issue = 1 | pages = 220–8 | date = July 1999 | pmid = 10364535 | pmc = 1378093 | doi = 10.1086/302449 }}</ref>

[[Phenotype]]s (measurable traits), such as height or risk for heart disease, are the product of some combination of [[Gene-environment interplay|genes and environment]]. These traits can be predicted using [[polygenic score]]s, which seek to isolate and estimate the contribution of genetics to a trait by summing the effects of many individual genetic variants. To construct a score, researchers first enrol participants in an association study to estimate the contribution of each genetic variant. Then, they can use the estimated contributions of each genetic variant to calculate a score for the trait for an individual who was not in the original association study. If structure in the study population is correlated with environmental variation, then the polygenic score is no longer measuring the genetic component alone.<ref name="blanc2020">{{cite journal | vauthors = Blanc J, Berg JJ | title = How well can we separate genetics from the environment? | journal = eLife | volume = 9 | pages = e64948 | date = December 2020 | pmid = 33355092 | doi = 10.7554/eLife.64948 | pmc = 7758058 }}</ref>

Several methods can at least partially control for this confounding effect. The [[genomic control]] method was introduced in 1999 and is a relatively [[nonparametric statistics|nonparametric]] method for controlling the inflation of [[test statistic]]s.<ref name="devlin_roeder1999">{{cite journal | vauthors = Devlin B, Roeder K | title = Genomic control for association studies | journal = Biometrics | volume = 55 | issue = 4 | pages = 997–1004 | date = December 1999 | pmid = 11315092 | doi = 10.1111/j.0006-341X.1999.00997.x }}</ref> It is also possible to use [[linkage disequilibrium|unlinked]] [[genetic marker]]s to estimate each individual's ancestry proportions from some ''K'' subpopulations, which are assumed to be unstructured.<ref>{{cite journal | vauthors = Pritchard JK, Stephens M, Rosenberg NA, Donnelly P | title = Association mapping in structured populations | journal = American Journal of Human Genetics | volume = 67 | issue = 1 | pages = 170–81 | date = July 2000 | pmid = 10827107 | pmc = 1287075 | doi = 10.1086/302959 }}</ref> More recent approaches make use of [[principal component analysis]] (PCA), as demonstrated by [[Alkes Price]] and colleagues,<ref>{{cite journal | vauthors = Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D | title = Principal components analysis corrects for stratification in genome-wide association studies | journal = Nature Genetics | volume = 38 | issue = 8 | pages = 904–9 | date = August 2006 | pmid = 16862161 | doi = 10.1038/ng1847 | s2cid = 8127858 }}</ref> or by deriving a [[Covariance#In_genetics_and_molecular_biology|genetic relationship matrix]] (also called a kinship matrix) and including it in a linear [[mixed model]] (LMM).<ref>{{cite journal | vauthors = Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES | display-authors = 6 | title = A unified mixed-model method for association mapping that accounts for multiple levels of relatedness | journal = Nature Genetics | volume = 38 | issue = 2 | pages = 203–8 | date = February 2006 | pmid = 16380716 | doi = 10.1038/ng1702 | s2cid = 8507433 }}</ref><ref>{{cite journal | vauthors = Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N, Price AL | display-authors = 6 | title = Efficient Bayesian mixed-model analysis increases association power in large cohorts | journal = Nature Genetics | volume = 47 | issue = 3 | pages = 284–90 | date = March 2015 | pmid = 25642633 | pmc = 4342297 | doi = 10.1038/ng.3190 | author-link5 = Hilary Finucane }}</ref>

PCA and LMMs have become the most common methods to control for confounding from population structure. Though they are likely sufficient for avoiding false positives in association studies, they are still vulnerable to overestimating effect sizes of marginally associated variants and can substantially bias estimates of polygenic scores and trait [[heritability]].<ref name="zaidi2020">{{cite journal | vauthors = Zaidi AA, Mathieson I | title = Demographic history mediates the effect of stratification on polygenic scores | journal = eLife | volume = 9 | pages = e61548 | date = November 2020 | pmid = 33200985 | doi = 10.7554/eLife.61548 | pmc = 7758063 | veditors = Perry GH, Turchin MC, Martin P }}</ref><ref name="sohail2019">{{cite journal | vauthors = Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, Chiang CW, Hirschhorn J, Daly MJ, Patterson N, Neale B, Mathieson I, Reich D, Sunyaev SR | display-authors = 6 | title = Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies | journal = eLife | volume = 8 | pages = e39702 | date = March 2019 | pmid = 30895926 | doi = 10.7554/eLife.39702 | pmc = 6428571 | veditors = Nordborg M, McCarthy MI, Barton NH, Hermisson J }}</ref> If environmental effects are related to a variant that exists in only one specific region (for example, a pollutant is found in only one city), it may not be possible to correct for this population structure effect at all.<ref name="blanc2020"/> For many traits, the role of structure is complex and not fully understood, and incorporating it into genetic studies remains a challenge and is an active area of research.<ref name="lawson2020">{{cite journal | vauthors = Lawson DJ, Davies NM, Haworth S, Ashraf B, Howe L, Crawford A, Hemani G, Davey Smith G, Timpson NJ | display-authors = 6 | title = Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity? | journal = Human Genetics | volume = 139 | issue = 1 | pages = 23–41 | date = January 2020 | pmid = 31030318 | pmc = 6942007 | doi = 10.1007/s00439-019-02014-8 }}</ref>

== In other organisms ==
In non-human organisms, population structure is used to study diversity in crops, which can identify potential weaknesses to disease, or be used to trace human population histories by tracing the genetic history of cultivars. It can also be used to examine the evolution of microscopic organisms and pathogens. In animals, population structure is a useful tool for tracing the origins of disease vectors like mosquitos, or to study the origins and distributions of endangered animals.

* In non-human animals, plants, bacteria, etc
* Conservation
* Fighting disease (pests, vectors, agriculture)

* Mosquito <ref name="KittayapongCarvajal2020">{{cite journal |vauthors=Carvajal TM, Ogishi K, Yaegeshi S, Hernandez LF, Viacrusis KM, Ho HT, Amalin DM, Watanabe K |title=Fine-scale population genetic structure of dengue mosquito vector, Aedes aegypti, in Metropolitan Manila, Philippines |journal=PLoS Negl Trop Dis |volume=14 |issue=5 |pages=e0008279 |date=May 2020 |pmid=32365059 |pmc=7224578 |doi=10.1371/journal.pntd.0008279 |url=}}</ref>

* Rhino <ref name="TunstallKock2018">{{cite journal |vauthors=Tunstall T, Kock R, Vahala J, Diekhans M, Fiddes I, Armstrong J, Paten B, Ryder OA, Steiner CC |title=Evaluating recovery potential of the northern white rhinoceros from cryopreserved somatic cells |journal=Genome Res |volume=28 |issue=6 |pages=780–788 |date=June 2018 |pmid=29798851 |pmc=5991516 |doi=10.1101/gr.227603.117 |url=}}</ref>

* {{cite journal |vauthors=Becquet C, Patterson N, Stone AC, Przeworski M, Reich D |title=Genetic structure of chimpanzee populations |journal=PLoS Genet |volume=3 |issue=4 |pages=e66 |date=April 2007 |pmid=17447846 |pmc=1853122 |doi=10.1371/journal.pgen.0030066 |url=}}

* {{cite journal |vauthors=Didelot X, Bowden R, Street T, Golubchik T, Spencer C, McVean G, Sangal V, Anjum MF, Achtman M, Falush D, Donnelly P |title=Recombination and population structure in Salmonella enterica |journal=PLoS Genet |volume=7 |issue=7 |pages=e1002191 |date=July 2011 |pmid=21829375 |pmc=3145606 |doi=10.1371/journal.pgen.1002191 |url=}}

* {{cite journal |vauthors=Islam MZ, Khalequzzaman M, Prince MF, Siddique MA, Rashid ES, Ahmed MS, Pittendrigh BR, Ali MP |title=Diversity and population structure of red rice germplasm in Bangladesh |journal=PLoS One |volume=13 |issue=5 |pages=e0196096 |date=2018 |pmid=29718936 |pmc=5931645 |doi=10.1371/journal.pone.0196096 |url=}}

* {{cite journal |vauthors=Gur L, Reuveni M, Cohen Y, Cadle-Davidson L, Kisselstein B, Ovadia S, Frenkel O |title=Population structure of Erysiphe necator on domesticated and wild vines in the Middle East raises questions on the origin of the grapevine powdery mildew pathogen |journal=Environ Microbiol |volume= |issue= |pages= |date=January 2021 |pmid=33459475 |doi=10.1111/1462-2920.15401 |url=}}

* {{cite journal |vauthors=Cornwell BH, Hernández L |title=Genetic structure in the endosymbiont Breviolum 'muscatinei' is correlated with geographical location, environment and host species |journal=Proc Biol Sci |volume=288 |issue=1946 |pages=20202896 |date=March 2021 |pmid=33715441 |doi=10.1098/rspb.2020.2896 |url=}}

* {{cite journal |vauthors=Henry P, Miquelle D, Sugimoto T, McCullough DR, Caccone A, Russello MA |title=In situ population structure and ex situ representation of the endangered Amur tiger |journal=Mol Ecol |volume=18 |issue=15 |pages=3173–84 |date=August 2009 |pmid=19555412 |doi=10.1111/j.1365-294X.2009.04266.x |url=}}

* {{cite journal |vauthors=Wu FQ, Shen SK, Zhang XJ, Wang YH, Sun WB |title=Genetic diversity and population structure of an extremely endangered species: the world's largest Rhododendron |journal=AoB Plants |volume=7 |issue= |pages= |date=December 2014 |pmid=25477251 |pmc=4294443 |doi=10.1093/aobpla/plu082 |url=}}

* {{cite journal |vauthors=Hatmaker EA, Staton ME, Dattilo AJ, Hadziabdic Ð, Rinehart TA, Schilling EE, Trigiano RN, Wadl PA |title=Population Structure and Genetic Diversity Within the Endangered Species Pityopsis ruthii (Asteraceae) |journal=Front Plant Sci |volume=9 |issue= |pages=943 |date=2018 |pmid=30050545 |pmc=6050971 |doi=10.3389/fpls.2018.00943 |url=}}

* {{cite journal |vauthors=Kleinhans C, Willows-Munro S |title=Low genetic diversity and shallow population structure in the endangered vulture, Gyps coprotheres |journal=Sci Rep |volume=9 |issue=1 |pages=5536 |date=April 2019 |pmid=30940898 |pmc=6445149 |doi=10.1038/s41598-019-41755-4 |url=}}

* {{cite journal |vauthors=Dalén L, Kvaløy K, Linnell JD, Elmhagen B, Strand O, Tannerfeldt M, Henttonen H, Fuglei E, Landa A, Angerbjörn A |title=Population structure in a critically endangered arctic fox population: does genetics matter? |journal=Mol Ecol |volume=15 |issue=10 |pages=2809–19 |date=September 2006 |pmid=16911202 |doi=10.1111/j.1365-294X.2006.02983.x |url=}}

* {{cite journal |vauthors=Barr KR, Lindsay DL, Athrey G, Lance RF, Hayden TJ, Tweddale SA, Leberg PL |title=Population structure in an endangered songbird: maintenance of genetic differentiation despite high vagility and significant population recovery |journal=Mol Ecol |volume=17 |issue=16 |pages=3628–39 |date=August 2008 |pmid=18643883 |doi=10.1111/j.1365-294X.2008.03868.x |url=}}

* {{cite journal |vauthors=Richmond JQ, Wood DA, Westphal MF, Vandergast AG, Leaché AD, Saslaw LR, Butterfield HS, Fisher RN |title=Persistence of historical population structure in an endangered species despite near-complete biome conversion in California's San Joaquin Desert |journal=Mol Ecol |volume=26 |issue=14 |pages=3618–3635 |date=July 2017 |pmid=28370723 |doi=10.1111/mec.14125 |url=}}


== Refs ==
== Refs ==
Line 51: Line 139:


* <ref name="HartlClark">{{Cite book|last1=Hartl|first1=Daniel L.|last2=Clark|first2=Andrew G.|url=https://www.worldcat.org/oclc/37481398|title=Principles of population genetics|date=1997|publisher=Sinauer Associates|isbn=0-87893-306-9|edition=3rd|location=Sunderland, MA|oclc=37481398|pages=111-163}}</ref>
* <ref name="HartlClark">{{Cite book|last1=Hartl|first1=Daniel L.|last2=Clark|first2=Andrew G.|url=https://www.worldcat.org/oclc/37481398|title=Principles of population genetics|date=1997|publisher=Sinauer Associates|isbn=0-87893-306-9|edition=3rd|location=Sunderland, MA|oclc=37481398|pages=111-163}}</ref>

== Abandoned refs ==
Might be useful:

*{{cite journal | vauthors = Frichot E, Mathieu F, Trouillon T, Bouchard G, François O | title = Fast and efficient estimation of individual ancestry coefficients | journal = Genetics | volume = 196 | issue = 4 | pages = 973–83 | date = April 2014 | pmid = 24496008 | pmc = 3982712 | doi = 10.1534/genetics.113.160572 }}

== actual refs ==
{{reflist}}

Latest revision as of 01:38, 8 September 2021

Measures

[edit]

Population structure is a complex phenomenon and no single measure captures it entirely. Understanding a population's structure requires a combination of methods and measures.[1][2]

Heterozygosity

[edit]
A population bottleneck can result in a loss of heterozygosity. In this hypothetical population, an allele has become fixed after the population repeatedly dropped from 10 to 3.

One of the results of population structure is a reduction in heterozygosity. When populations split, alleles have a higher chance of reaching fixation within subpopulations, especially if the subpopulations are small or have been isolated for long periods. This reduction in heterozygosity can be thought of as an extension of inbreeding, with individuals in subpopulations being more likely to share a recent common ancestor.[3] The scale is important — an individual with both parents born in the United Kingdom is not inbred relative to that country's population, but is more inbred than two humans selected from the entire world. This motivates the derivation of Wright's F-statistics (also called "fixation indices"), which measure inbreeding through observed versus expected heterozygosity.[4] For example, measures the inbreeding coefficient at a single locus for an individual relative to some subpopulation :[5]

Here, is the fraction of individuals in subpopulation that are heterozygous. Assuming there are two alleles, that occur at respective frequencies , it is expected that under random mating the subpopulation will have a heterozygosity rate of . Then:

Similarly, for the total population , we can define allowing us to compute the expected heterozygosity of subpopulation and the value as:[5]


If F is 0, then the allele frequencies between populations are identical, suggesting no structure. The theoretical maximum value of 1 is attained when an allele reaches total fixation, but most observed maximum values are far lower.[3] FST is one of the most common measures of population structure and there are several different formulations depending on the number of populations and the alleles of interest. Although it is sometimes used as a genetic distance between populations, it does not always satisfy the triangle inequality and thus is not a metric.[6] It also depends on within-population diversity, which makes interpretation and comparison difficult.[2]

Admixture inference

[edit]

An individual's genotype can be modelled as an admixture between K discrete clusters of populations.[5] Each cluster is defined by the frequencies of its genotypes, and the contribution of a cluster to an individual's genotypes is measured via an estimator. In 2000, Jonathan K. Pritchard introduced the STRUCTURE algorithm to estimate these proportions via Markov chain Monte Carlo.[7] Since then, algorithms (such as ADMIXTURE) have been developed using other estimation techniques.[8][9] Estimated proportions can be visualized using bar plots — each bar represents an individual, and is subdivided to represent the proportion of an individual's genetic ancestry from one of the K populations.[5]

A study of population structure of humans in Northern Africa and neighboring populations modelled using ADMIXTURE and assuming K=2,4,6,8 populations (Figure B, top to bottom). Varying K changes the scale of clustering. At K=2, 80% of the inferred ancestry for most North Africans is assigned to cluster that is common to Basque, Tuscan, and Qatari Arab individuals (in purple). At K=4, clines of North African ancestry appear (in light blue). At K=6, opposite clines of Near Eastern (Qatari) ancestry appear (in green). At K=8, Tunisian Berbers appear as a cluster (in dark blue).[10]

Varying K can illustrate different scales of population structure; using a small K for the entire human population will subdivide people roughly by continent, while using large K will partition populations into finer subgroups.[5] Though clustering methods are popular, they are open to misinterpretation: for non-simulated data, there is never a true value of K, but rather an approximation considered useful for a given question.[1] They are sensitive to sampling strategies, sample size, and close relatives in data sets; there may be no discrete populations at all; and there may be hierarchical structure where subpopulations are nested.[1] Clusters may be admixed themselves,[5] and may not have a useful interpretation as source populations.[11]

Dimensionality reduction

[edit]
A map of the locations of genetic samples of several African populations (left) and principal components 1 and 2 of the data superimposed on the map (right). The principal coordinate plane has been rotated 16.11° to align with the map. It corresponds to the east-west and north-south distributions of the populations.[12]

Genetic data are high dimensional and dimensionality reduction techniques can capture population structure. Principal component analysis (PCA) was first applied in population genetics in 1978 by Cavalli-Sforza and colleagues and resurged with high-throughput sequencing.[5][13]

Initially PCA was used on allele frequencies at known genetic markers for populations, though later it was found that by coding SNPs as integers (for example, as the number of non-reference alleles) and normalizing the values, PCA could be applied at the level of individuals.[9] One formulation considers individuals and bi-allelic SNPs. For each individual , the value at locus is is the number of non-reference alleles (one of ). If the allele frequency at is , then the resulting matrix of normalized genotypes has entries:[5]

PCA transforms data to maximize variance; given enough data, when each individual is visualized as point on a plot, discrete clusters can form.[9] Individuals with admixed ancestries will tend to fall between clusters, and when there is homogenous isolation by distance in the data, the top PC vectors will reflect geographic variation.[9][14] The eigenvectors generated by PCA can be explicitly written in terms of the mean coalescent times for pairs of individuals, making PCA useful for interpretting population histories of groups in a given sample. PCA cannot, however, distinguish between different processes that lead to the same mean coalescent times.[15]

Multidimensional scaling and discriminant analysis have been used to study differentiation, population assignment, and to analyze genetic distances.[16] Neighborhood graph approaches like t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) can visualize continental and subcontinental structure in human data.[17][18] With larger datasets, UMAP better captures multiple scales of population structure — fine-scale patterns are hidden or split with other methods, and these are of interest when there are many diverse or admixed populations, or when examining relationships between genotypes, phenotypes, and/or geography.[18][19] Variational autoencoders can generate artificial genotypes with structure representative of the input data.[20]

In humans

[edit]
  • Analysis of structure can re-construct the histories of populations
  • History has been shaped by migrations, population bottlenecks, admixture. Models that re-create the structure from such events are useful.
  • Commercial testing and genetic ancestry?

[21] Cite error: A <ref> tag is missing the closing </ref> (see the help page). Also, actual genetic findings may be overlooked if the locus is less prevalent in the population where the case subjects are chosen. For this reason, it was common in the 1990s to use family-based data where the effect of population structure can easily be controlled for using methods such as the transmission disequilibrium test (TDT).[22]

Phenotypes (measurable traits), such as height or risk for heart disease, are the product of some combination of genes and environment. These traits can be predicted using polygenic scores, which seek to isolate and estimate the contribution of genetics to a trait by summing the effects of many individual genetic variants. To construct a score, researchers first enrol participants in an association study to estimate the contribution of each genetic variant. Then, they can use the estimated contributions of each genetic variant to calculate a score for the trait for an individual who was not in the original association study. If structure in the study population is correlated with environmental variation, then the polygenic score is no longer measuring the genetic component alone.[23]

Several methods can at least partially control for this confounding effect. The genomic control method was introduced in 1999 and is a relatively nonparametric method for controlling the inflation of test statistics.[24] It is also possible to use unlinked genetic markers to estimate each individual's ancestry proportions from some K subpopulations, which are assumed to be unstructured.[25] More recent approaches make use of principal component analysis (PCA), as demonstrated by Alkes Price and colleagues,[26] or by deriving a genetic relationship matrix (also called a kinship matrix) and including it in a linear mixed model (LMM).[27][28]

PCA and LMMs have become the most common methods to control for confounding from population structure. Though they are likely sufficient for avoiding false positives in association studies, they are still vulnerable to overestimating effect sizes of marginally associated variants and can substantially bias estimates of polygenic scores and trait heritability.[29][30] If environmental effects are related to a variant that exists in only one specific region (for example, a pollutant is found in only one city), it may not be possible to correct for this population structure effect at all.[23] For many traits, the role of structure is complex and not fully understood, and incorporating it into genetic studies remains a challenge and is an active area of research.[31]

In other organisms

[edit]

In non-human organisms, population structure is used to study diversity in crops, which can identify potential weaknesses to disease, or be used to trace human population histories by tracing the genetic history of cultivars. It can also be used to examine the evolution of microscopic organisms and pathogens. In animals, population structure is a useful tool for tracing the origins of disease vectors like mosquitos, or to study the origins and distributions of endangered animals.

  • In non-human animals, plants, bacteria, etc
  • Conservation
  • Fighting disease (pests, vectors, agriculture)
  • Becquet C, Patterson N, Stone AC, Przeworski M, Reich D (April 2007). "Genetic structure of chimpanzee populations". PLoS Genet. 3 (4): e66. doi:10.1371/journal.pgen.0030066. PMC 1853122. PMID 17447846.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  • Gur L, Reuveni M, Cohen Y, Cadle-Davidson L, Kisselstein B, Ovadia S, Frenkel O (January 2021). "Population structure of Erysiphe necator on domesticated and wild vines in the Middle East raises questions on the origin of the grapevine powdery mildew pathogen". Environ Microbiol. doi:10.1111/1462-2920.15401. PMID 33459475.
  • Cornwell BH, Hernández L (March 2021). "Genetic structure in the endosymbiont Breviolum 'muscatinei' is correlated with geographical location, environment and host species". Proc Biol Sci. 288 (1946): 20202896. doi:10.1098/rspb.2020.2896. PMID 33715441.
  • Henry P, Miquelle D, Sugimoto T, McCullough DR, Caccone A, Russello MA (August 2009). "In situ population structure and ex situ representation of the endangered Amur tiger". Mol Ecol. 18 (15): 3173–84. doi:10.1111/j.1365-294X.2009.04266.x. PMID 19555412.
  • Dalén L, Kvaløy K, Linnell JD, Elmhagen B, Strand O, Tannerfeldt M, Henttonen H, Fuglei E, Landa A, Angerbjörn A (September 2006). "Population structure in a critically endangered arctic fox population: does genetics matter?". Mol Ecol. 15 (10): 2809–19. doi:10.1111/j.1365-294X.2006.02983.x. PMID 16911202.
  • Barr KR, Lindsay DL, Athrey G, Lance RF, Hayden TJ, Tweddale SA, Leberg PL (August 2008). "Population structure in an endangered songbird: maintenance of genetic differentiation despite high vagility and significant population recovery". Mol Ecol. 17 (16): 3628–39. doi:10.1111/j.1365-294X.2008.03868.x. PMID 18643883.
  • Richmond JQ, Wood DA, Westphal MF, Vandergast AG, Leaché AD, Saslaw LR, Butterfield HS, Fisher RN (July 2017). "Persistence of historical population structure in an endangered species despite near-complete biome conversion in California's San Joaquin Desert". Mol Ecol. 26 (14): 3618–3635. doi:10.1111/mec.14125. PMID 28370723.

Refs

[edit]

Abandoned refs

[edit]

Might be useful:

actual refs

[edit]
  1. ^ a b c Lawson, Daniel J.; van Dorp, Lucy; Falush, Daniel (2018). "A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots". Nature Communications. 9 (1). doi:10.1038/s41467-018-05257-7. ISSN 2041-1723. PMC 6092366.
  2. ^ a b c Meirmans, Patrick G.; Hedrick, Philip W. (2010). "Assessing population structure:FST and related measures". Molecular Ecology Resources. 11 (1): 5–18. doi:10.1111/j.1755-0998.2010.02927.x. ISSN 1755-098X.
  3. ^ a b c Hartl, Daniel L.; Clark, Andrew G. (1997). Principles of population genetics (3rd ed.). Sunderland, MA: Sinauer Associates. pp. 111–163. ISBN 0-87893-306-9. OCLC 37481398.
  4. ^ Wright, Sewall (1949). "THE GENETICAL STRUCTURE OF POPULATIONS". Annals of Eugenics. 15 (1): 323–354. doi:10.1111/j.1469-1809.1949.tb02451.x. ISSN 2050-1420.
  5. ^ a b c d e f g h i Coop, Graham (2019). Population and Quantitative Genetics. pp. 22–44. Cite error: The named reference "Coop2019" was defined multiple times with different content (see the help page).
  6. ^ Arbisser, Ilana M.; Rosenberg, Noah A. (2020). "FST and the triangle inequality for biallelic markers". Theoretical Population Biology. 133: 117–129. doi:10.1016/j.tpb.2019.05.003. ISSN 0040-5809.
  7. ^ Pritchard, Jonathan K; Stephens, Matthew; Donnelly, Peter (2000). "Inference of Population Structure Using Multilocus Genotype Data". Genetics. 155 (2): 945–959. doi:10.1093/genetics/155.2.945. ISSN 1943-2631.
  8. ^ Alexander, D. H.; Novembre, J.; Lange, K. (2009). "Fast model-based estimation of ancestry in unrelated individuals". Genome Research. 19 (9): 1655–1664. doi:10.1101/gr.094052.109. ISSN 1088-9051. PMC 2752134.
  9. ^ a b c d Novembre, John; Ramachandran, Sohini (2011). "Perspectives on Human Population Structure at the Cusp of the Sequencing Era". Annual Review of Genomics and Human Genetics. 12 (1): 245–274. doi:10.1146/annurev-genom-090810-183123. ISSN 1527-8204.
  10. ^ Henn BM, Botigué LR, Gravel S, Wang W, Brisbin A, Byrnes JK, Fadhlaoui-Zid K, Zalloua PA, Moreno-Estrada A, Bertranpetit J, Bustamante CD, Comas D (January 2012). "Genomic ancestry of North Africans supports back-to-Africa migrations". PLoS Genet. 8 (1): e1002397. doi:10.1371/journal.pgen.1002397. PMC 3257290. PMID 22253600.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  11. ^ Novembre, John (2016). "Pritchard, Stephens, and Donnelly on Population Structure". Genetics. 204 (2): 391–393. doi:10.1534/genetics.116.195164. ISSN 1943-2631.
  12. ^ Wang C, Zöllner S, Rosenberg NA (August 2012). "A quantitative comparison of the similarity between genes and geography in worldwide human populations". PLoS Genet. 8 (8): e1002886. doi:10.1371/journal.pgen.1002886. PMC 3426559. PMID 22927824.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  13. ^ Menozzi, P; Piazza, A; Cavalli-Sforza, L (1978). "Synthetic maps of human gene frequencies in Europeans". Science. 201 (4358): 786–792. doi:10.1126/science.356262. ISSN 0036-8075.
  14. ^ Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, Stephens M, Bustamante CD (November 2008). "Genes mirror geography within Europe". Nature. 456 (7218): 98–101. doi:10.1038/nature07331. PMC 2735096. PMID 18758442.
  15. ^ McVean G (October 2009). "A genealogical interpretation of principal components analysis". PLoS Genet. 5 (10): e1000686. doi:10.1371/journal.pgen.1000686. PMC 2757795. PMID 19834557.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  16. ^ Jombart T, Pontier D, Dufour AB (April 2009). "Genetic markers in the playground of multivariate analysis". Heredity (Edinb). 102 (4): 330–41. doi:10.1038/hdy.2008.130. PMID 19156164.
  17. ^ Li W, Cerise JE, Yang Y, Han H (August 2017). "Application of t-SNE to human genetic data". J Bioinform Comput Biol. 15 (4): 1750017. doi:10.1142/S0219720017500172. PMID 28718343.
  18. ^ a b Diaz-Papkovich A, Anderson-Trocmé L, Ben-Eghan C, Gravel S (November 2019). "UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts". PLoS Genet. 15 (11): e1008432. doi:10.1371/journal.pgen.1008432. PMC 6853336. PMID 31675358.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  19. ^ Sakaue S, Hirata J, Kanai M, Suzuki K, Akiyama M, Lai Too C, Arayssi T, Hammoudeh M, Al Emadi S, Masri BK, Halabi H, Badsha H, Uthman IW, Saxena R, Padyukov L, Hirata M, Matsuda K, Murakami Y, Kamatani Y, Okada Y (March 2020). "Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction". Nat Commun. 11 (1): 1569. doi:10.1038/s41467-020-15194-z. PMID 32218440.
  20. ^ Battey CJ, Coffing GC, Kern AD (January 2021). "Visualizing population structure with variational autoencoders". G3 (Bethesda). 11 (1). doi:10.1093/g3journal/jkaa036. PMC 8022710. PMID 33561250.
  21. ^ Wang K, Mathieson I, O'Connell J, Schiffels S (March 2020). "Tracking human population structure through time from whole genome sequences". PLoS Genet. 16 (3): e1008552. doi:10.1371/journal.pgen.1008552. PMC 7082067. PMID 32150539.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  22. ^ Pritchard JK, Rosenberg NA (July 1999). "Use of unlinked genetic markers to detect population stratification in association studies". American Journal of Human Genetics. 65 (1): 220–8. doi:10.1086/302449. PMC 1378093. PMID 10364535.
  23. ^ a b Blanc J, Berg JJ (December 2020). "How well can we separate genetics from the environment?". eLife. 9: e64948. doi:10.7554/eLife.64948. PMC 7758058. PMID 33355092.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  24. ^ Devlin B, Roeder K (December 1999). "Genomic control for association studies". Biometrics. 55 (4): 997–1004. doi:10.1111/j.0006-341X.1999.00997.x. PMID 11315092.
  25. ^ Pritchard JK, Stephens M, Rosenberg NA, Donnelly P (July 2000). "Association mapping in structured populations". American Journal of Human Genetics. 67 (1): 170–81. doi:10.1086/302959. PMC 1287075. PMID 10827107.
  26. ^ Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (August 2006). "Principal components analysis corrects for stratification in genome-wide association studies". Nature Genetics. 38 (8): 904–9. doi:10.1038/ng1847. PMID 16862161. S2CID 8127858.
  27. ^ Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al. (February 2006). "A unified mixed-model method for association mapping that accounts for multiple levels of relatedness". Nature Genetics. 38 (2): 203–8. doi:10.1038/ng1702. PMID 16380716. S2CID 8507433.
  28. ^ Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, et al. (March 2015). "Efficient Bayesian mixed-model analysis increases association power in large cohorts". Nature Genetics. 47 (3): 284–90. doi:10.1038/ng.3190. PMC 4342297. PMID 25642633.
  29. ^ Zaidi AA, Mathieson I (November 2020). Perry GH, Turchin MC, Martin P (eds.). "Demographic history mediates the effect of stratification on polygenic scores". eLife. 9: e61548. doi:10.7554/eLife.61548. PMC 7758063. PMID 33200985.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  30. ^ Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, et al. (March 2019). Nordborg M, McCarthy MI, Barton NH, Hermisson J (eds.). "Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies". eLife. 8: e39702. doi:10.7554/eLife.39702. PMC 6428571. PMID 30895926.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  31. ^ Lawson DJ, Davies NM, Haworth S, Ashraf B, Howe L, Crawford A, et al. (January 2020). "Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity?". Human Genetics. 139 (1): 23–41. doi:10.1007/s00439-019-02014-8. PMC 6942007. PMID 31030318.
  32. ^ Carvajal TM, Ogishi K, Yaegeshi S, Hernandez LF, Viacrusis KM, Ho HT, Amalin DM, Watanabe K (May 2020). "Fine-scale population genetic structure of dengue mosquito vector, Aedes aegypti, in Metropolitan Manila, Philippines". PLoS Negl Trop Dis. 14 (5): e0008279. doi:10.1371/journal.pntd.0008279. PMC 7224578. PMID 32365059.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  33. ^ Tunstall T, Kock R, Vahala J, Diekhans M, Fiddes I, Armstrong J, Paten B, Ryder OA, Steiner CC (June 2018). "Evaluating recovery potential of the northern white rhinoceros from cryopreserved somatic cells". Genome Res. 28 (6): 780–788. doi:10.1101/gr.227603.117. PMC 5991516. PMID 29798851.
  34. ^ Barroso, Gustavo V.; Moutinho, Ana Filipa; Dutheil, Julien Y. (2020), Dutheil, Julien Y. (ed.), "A Population Genomics Lexicon", Statistical Population Genomics, vol. 2090, New York, NY: Springer US, pp. 3–17, doi:10.1007/978-1-0716-0199-0_1, ISBN 978-1-0716-0198-3, retrieved 2021-05-31
  35. ^ Liu, Chi-Chun; Shringarpure, Suyash; Lange, Kenneth; Novembre, John (2020), Dutheil, Julien Y. (ed.), "Exploring Population Structure with Admixture Models and Principal Component Analysis", Statistical Population Genomics, vol. 2090, New York, NY: Springer US, pp. 67–86, doi:10.1007/978-1-0716-0199-0_4, ISBN 978-1-0716-0198-3, retrieved 2021-05-31
  36. ^ Gillespie, John H. (1998). "4". Population genetics : a concise guide. Baltimore, Md: The Johns Hopkins University Press. ISBN 0-8018-5754-6. OCLC 36817311.