Jump to content

Pathway analysis: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
The description of PT is made more systematic and general.
Citation bot (talk | contribs)
Removed parameters. | Use this bot. Report bugs. | #UCB_CommandLine
 
(45 intermediate revisions by 20 users not shown)
Line 1: Line 1:
{{cs1 config|name-list-style=vanc|display-authors=6}}
In bioinformatics research, '''pathway analysis software''' is used to identify related [[proteins]] within a pathway or building pathway de novo from the proteins of interest. This is helpful when studying [[differential expression]] of a [[gene]] in a disease or analyzing any [[omics]] dataset with a large number of proteins. By examining the changes in [[gene expression]] in a pathway, its biological causes can be explored.
[[File:Fgene-10-01203-g002.jpg|thumb|400x400px|Pathway resources and types of pathway analysis using databases like [[KEGG]], [[Reactome]] and [[WikiPathways]].<ref>{{cite journal | vauthors = Mubeen S, Hoyt CT, Gemünd A, Hofmann-Apitius M, Fröhlich H, Domingo-Fernández D | title = The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling | journal = Frontiers in Genetics | volume = 10 | pages = 1203 | date = 2019 | pmid = 31824580 | pmc = 6883970 | doi = 10.3389/fgene.2019.01203 | doi-access = free }}</ref>]]
[[Metabolic pathway|Pathway]] is the term from molecular biology which depicts an artificial simplified model of a process within a cell or tissue. A typical pathway model starts with an extracellular [[signaling molecule]] that activates a specific [[Receptor (biochemistry)|receptor]], thus triggering a chain of protein-protein or protein-small molecule interactions.<ref>Berg J. M., Tymoczko J. L., Stryer L. Biochemistry, 5th edition, New York: W. H. Freeman; 2002</ref>
'''Pathway''' is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a [[metabolic pathway]] describing an enzymatic process within a cell or tissue or a [[signaling pathway]] model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular [[signaling molecule]] that activates a specific [[Receptor (biochemistry)|receptor]], thus triggering a chain of molecular interactions.<ref>{{cite book | vauthors = Berg JM, Tymoczko JL, Stryer L |title=Biochemistry |date=2002 |publisher=W.H. Freeman |location=New York |isbn=978-0-7167-3051-4 |edition=5th}}</ref> A pathway is most often represented as a relatively small [[Graph (discrete mathematics)|graph]] with gene, protein, and/or small molecule [[Vertex (graph theory)|nodes]] connected by [[Edge (geometry)|edges]] of known functional relations. While a simpler pathway might appear as a chain,<ref>{{cite journal | vauthors = Ohlrogge J, Browse J | title = Lipid biosynthesis | journal = The Plant Cell | volume = 7 | issue = 7 | pages = 957–70 | date = July 1995 | pmid = 7640528 | doi = 10.1105/tpc.7.7.957 | pmc = 160893 | s2cid = 219201001 | doi-access = free }}</ref> complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation.<ref>{{cite web |title=Main Page - SBML.caltech.edu |url=http://sbml.org/Main_Page |website=sbml.org |language=en}}</ref><ref>{{cite web |title=KGML (KEGG Markup Language) |url=https://www.genome.jp/kegg/xml/ |website=www.genome.jp}}</ref> In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as [[protein families]], [[Gene Ontology]] (GO) and [[Disease Ontology]] (DO) terms etc.
'''Pathway analysis''' helps to understand or interpret [[omics]] data from the point of view of canonical prior knowledge structured in the form of pathways diagrams. It allows finding distinct cell processes ([[:Category:Cellular processes|Cellular processes]]), diseases or [[signaling pathways]] that are statistically associated with selection of differentially expressed genes between two samples.<ref>{{cite journal|last1=García-Campos|first1=Miguel Angel|last2=Espinal-Enríquez|first2=Jesús|last3=Hernández-Lemus|first3=Enrique|title=Pathway analysis: State of the art|journal=Frontiers in Physiology|date=2015|volume=6|pages=383|doi=10.3389/fphys.2015.00383|pmid=26733877|pmc=4681784}}</ref> Often but erroneously pathway analysis is used as synonym for [[Network theory|network analysis]] (functional enrichment analysis and gene set analysis).<ref>[http://software.broadinstitute.org/gsea/index.jsp GSEA]</ref>
In bioinformatics, methods of pathway analysis might be used to identify key [[genes]]/
[[proteins]] within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway ''de novo'' from proteins that have been identified as key affected elements. By examining changes in e.g. [[gene expression]] in a pathway, its biological activity can be explored.
However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental (or pathological) condition that was studied with [[omics]] tools or [[genome-wide association study]].<ref name="Garcia-Campos">{{cite journal | vauthors = García-Campos MA, Espinal-Enríquez J, Hernández-Lemus E | title = Pathway Analysis: State of the Art | journal = Frontiers in Physiology | volume = 6 | pages = 383 | date = 2015 | pmid = 26733877 | pmc = 4681784 | doi = 10.3389/fphys.2015.00383 | doi-access = free }}</ref> Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions (with a large gene fraction lacking any annotation). In such situations, the most productive way of exploring the list is to identify enrichment of specific {{abbr|FGS|Functional Gene Set}}s in it. The general approach of enrichment analyses is to identify FGSs, members of which were most ''frequently'' or most ''strongly'' altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.


==Uses==
==Use==
The data for pathway analysis come from [[high throughput biology]]. This includes high throughput [[sequencing]] data and [[microarray]] data. Before pathway analysis can be done, the [[omics]] data should be normalized, and genes should be ranked by differential expression usually with help of [[Student's t-test]], [[ANOVA]] or other statistics. In general, any list of statistical ranked genes can be analyzed by pathway analysis. For example, often the functional activity of proteins can be inferred using network enrichment analysis of genes deferentially expressed in the experiment. Such functional activity scores can then be used for pathway analysis to find pathways responsible for observed differential expression. In case when ranking is not available, simply a list of all genes can be analyzed. Also it is possible to integrate multiple [[microarray]] data sets from different research groups by meta-analysis and cross-platform normalization.<ref>{{cite journal | last1 = Walsh | first1 = Christopher | last2 = Hu | first2 = Pingzhao | last3 = Batt | first3 = Jane | last4 = Santos | first4 = Claudia | year = 2015 | title = Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery | url = | journal = Microarrays | volume = 4 | issue = 3| pages = 389–406 | doi = 10.3390/microarrays4030389 | pmid = 27600230 | pmc = 4996376 }}</ref> By using pathway analysis software, researchers can determine which gene groups such as [[Genetic pathway|pathway]]s, cell processes or diseases are enriched with over and under expressed in experimental data genes. They can also infer associated upstream and downstream regulators, [[proteins]], [[small molecules]], [[drugs]], etc.<ref>{{cite journal | last1 = Subramanian | first1 = Aravind | last2 = Tamayo | first2 = Pablo | last3 = Mootha | first3 = Vamsi K. | last4 = Mukherjee | first4 = Sayan | last5 = Ebert | first5 = Benjamin L. | last6 = Gillette | first6 = Michael A. | last7 = Paulovich | first7 = Amanda |display-authors=etal | year = 2005 | title = Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 102 | issue = 43| pages = 15545–50 | doi = 10.1073/pnas.0506580102 | pmid=16199517 | pmc=1239896 | bibcode = 2005PNAS..10215545S}}</ref> For example, pathway analysis of several independent microarray experiments ([[meta-analysis]]) helped to discover potential [[biomarkers]] in a single pathway important for fast-to-slow switch fiber type transition in [[Duchenne muscular dystrophy]].<ref>{{cite journal | last1 = Kotelnikova | first1 = Ekaterina | last2 = Shkrob | first2 = Maria A. | last3 = Pyatnitskiy | first3 = Mikhail A. | last4 = Ferlini | first4 = Alessandra | last5 = Daraselia | first5 = Nikolai | year = 2012 | title = Novel Approach to Meta-Analysis of Microarray Datasets Reveals Muscle Remodeling-Related Drug Targets and Biomarkers in Duchenne Muscular Dystrophy | url = | journal = PLoS Computational Biology | volume = 8 | issue = 2| page = e1002365 | doi = 10.1371/journal.pcbi.1002365 | pmid = 22319435 | pmc = 3271016 | bibcode = 2012PLSCB...8E2365K }}</ref> In other study [[meta-analysis]] identified two [[biomarkers]] in blood of patients with [[Parkinson's disease]], which can be useful for monitoring the disease.<ref>{{cite journal | last1 = Santiago | first1 = Jose A. | last2 = Potashkin | first2 = Judith A. | year = 2015 | title = Network-Based Metaanalysis Identifies HNF4A and PTBP1 as Longitudinally Dynamic Biomarkers for Parkinson's Disease | url = | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 112 | issue = 7| pages = 2257–62 | doi = 10.1073/pnas.1423573112 | pmid = 25646437 | pmc = 4343174 | bibcode = 2015PNAS..112.2257S }}</ref>
The data for pathway analysis come from [[high throughput biology]]. This includes high throughput [[sequencing]] data and [[microarray]] data. Before pathway analysis can be done, each gene's alteration should be evaluated using the [[omics]] dataset in either quantitative ([[Gene expression profiling|differential expression analysis]]) or qualitative (detection of somatic [[point mutations]] or mapping neighbor genes to a disease-associated [[Single-nucleotide polymorphism|SNP]]). It is also possible to combine datasets from different research groups or multiple omics platform with a meta-analysis and cross-platform regularization.<ref>{{cite journal | vauthors = Walsh CJ, Hu P, Batt J, Santos CC | title = Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery | journal = Microarrays | volume = 4 | issue = 3 | pages = 389–406 | date = August 2015 | pmid = 27600230 | pmc = 4996376 | doi = 10.3390/microarrays4030389 | doi-access = free }}</ref><ref name="Integration of somatic mutation, ex">{{cite journal | vauthors = Suo C, Hrydziuszko O, Lee D, Pramana S, Saputra D, Joshi H, Calza S, Pawitan Y | title = Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival | journal = Bioinformatics | volume = 31 | issue = 16 | pages = 2607–13 | date = August 2015 | pmid = 25810432 | doi = 10.1093/bioinformatics/btv164 | doi-access = free }}</ref> Further, a list where gene identifiers are accompanied by the alteration attributes is subjected to a pathway analysis. By using pathway analysis software, researchers can determine which {{abbr|FGS|Functional Gene Set}}s are enriched with the altered experimental genes<ref>{{cite journal | vauthors = Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM | title = Systematic determination of genetic network architecture | journal = Nature Genetics | volume = 22 | issue = 3 | pages = 281–5 | date = July 1999 | pmid = 10391217 | doi = 10.1038/10343 | s2cid = 14688842 }}</ref><ref name="Subramanian">{{cite journal | vauthors = Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP | title = Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 102 | issue = 43 | pages = 15545–50 | date = October 2005 | pmid = 16199517 | pmc = 1239896 | doi = 10.1073/pnas.0506580102 | bibcode = 2005PNAS..10215545S | doi-access = free }}</ref> For example, pathway analysis of several independent microarray experiments ([[meta-analysis]]) helped to discover potential [[biomarkers]] in a single pathway important for fast-to-slow switch fiber type transition in [[Duchenne muscular dystrophy]].<ref>{{cite journal | vauthors = Kotelnikova E, Shkrob MA, Pyatnitskiy MA, Ferlini A, Daraselia N | title = Novel approach to meta-analysis of microarray datasets reveals muscle remodeling-related drug targets and biomarkers in Duchenne muscular dystrophy | journal = PLOS Computational Biology | volume = 8 | issue = 2 | pages = e1002365 | date = February 2012 | pmid = 22319435 | pmc = 3271016 | doi = 10.1371/journal.pcbi.1002365 | bibcode = 2012PLSCB...8E2365K | doi-access = free }}</ref> In another study [[meta-analysis]] identified two [[biomarkers]] in blood of patients with [[Parkinson's disease]], which can be useful for monitoring the disease.<ref>{{cite journal | vauthors = Santiago JA, Potashkin JA | author-link2=Judith Potashkin|title = Network-based metaanalysis identifies HNF4A and PTBP1 as longitudinally dynamic biomarkers for Parkinson's disease | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 112 | issue = 7 | pages = 2257–62 | date = February 2015 | pmid = 25646437 | pmc = 4343174 | doi = 10.1073/pnas.1423573112 | bibcode = 2015PNAS..112.2257S | doi-access = free }}</ref> Candidate gene alleles causative of Alzheimer's disease and elderly dementia where first discovered via [[genome-wide association study]] and further validated with network enrichment analysis against {{abbr|FGS|Functional Gene Set}} consisting of known Alzheimer's genes.<ref>{{cite journal | vauthors = Reynolds CA, Hong MG, Eriksson UK, Blennow K, Wiklund F, Johansson B, Malmberg B, Berg S, Alexeyenko A, Grönberg H, Gatz M, Pedersen NL, Prince JA | title = Analysis of lipid pathway genes indicates association of sequence variation near SREBF1/TOM1L2/ATPAF2 with dementia risk | journal = Human Molecular Genetics | volume = 19 | issue = 10 | pages = 2068–78 | date = May 2010 | pmid = 20167577 | pmc = 2860895 | doi = 10.1093/hmg/ddq079 }}</ref><ref>{{cite journal | vauthors = Bennet AM, Reynolds CA, Eriksson UK, Hong MG, Blennow K, Gatz M, Alexeyenko A, Pedersen NL, Prince JA | title = Genetic association of sequence variants near AGER/NOTCH4 and dementia | journal = Journal of Alzheimer's Disease | volume = 24 | issue = 3 | pages = 475–84 | date = 1 January 2011 | pmid = 21297263 | pmc = 3477600 | doi = 10.3233/jad-2011-101848 }}</ref>


===Pathways Databases===
===Databases===


Pathway analysis needs a [[knowledge base]] with pathway collection and interaction networks. Pathway collections content, structure and functionality usually vary in different sources. The examples of the pathway collections are [[KEGG]] <ref>{{cite journal | last1 = Ogata | first1 = H. | last2 = Goto | first2 = S. | last3 = Sato | first3 = K. | last4 = Fujibuchi | first4 = W. | last5 = Bono | first5 = H. | last6 = Kanehisa | first6 = M. | year = 1999 | title = KEGG: Kyoto Encyclopedia of Genes and Genomes | journal = Nucleic Acids Research | volume = 27 | issue = 1| pages = 29–34 | doi=10.1093/nar/27.1.29 | pmid=9847135 | pmc=148090}}</ref>, [[WikiPathways]], and [[Reactome]].<ref>{{cite journal | last1 = Vastrik | first1 = Imre | last2 = D'Eustachio | first2 = Peter | last3 = Schmidt | first3 = Esther | last4 = Joshi-Tope | first4 = Geeta | last5 = Gopinath | first5 = Gopal | last6 = Croft | first6 = David | last7 = de Bono | first7 = Bernard |display-authors=etal | year = 2007 | title = Reactome: A Knowledge Base of Biologic Pathways and Processes | url = | journal = Genome Biology | volume = 8 | issue = 3| page = R39 | doi = 10.1186/gb-2007-8-3-r39 | pmid = 17367534 | pmc = 1868929 }}</ref> Also there are commercial pathways collections such as Pathway Studio pathways <ref>[https://mammalcedfx.pathwaystudio.com/app/search Pathway Studio Pathways]</ref> and IPA pathways.<ref>[https://www.qiagen.com/us/shop/genes-and-pathways/pathway-central/ Pathway Central]</ref>
Pathway collections and [[Interactome|interaction networks]] constitute the [[knowledge base]] required for a pathway analysis. Pathway content, structure, format, and functionality vary between different database resources such as [[KEGG]],<ref>{{cite journal | vauthors = Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M | title = KEGG: Kyoto Encyclopedia of Genes and Genomes | journal = Nucleic Acids Research | volume = 27 | issue = 1 | pages = 29–34 | date = January 1999 | pmid = 9847135 | pmc = 148090 | doi = 10.1093/nar/27.1.29 }}</ref> [[WikiPathways]], or [[Reactome]].<ref>{{cite journal | vauthors = Vastrik I, D'Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, Matthews L, Wu G, Birney E, Stein L | title = Reactome: a knowledge base of biologic pathways and processes | journal = Genome Biology | volume = 8 | issue = 3 | pages = R39 | year = 2007 | pmid = 17367534 | pmc = 1868929 | doi = 10.1186/gb-2007-8-3-r39 | doi-access = free }}</ref> Also exist proprietary pathways collections used by e.g. Pathway Studio<ref>[https://mammalcedfx.pathwaystudio.com/app/search Pathway Studio Pathways]</ref> and Ingenuity Pathway Analysis<ref>[https://www.qiagen.com/us/shop/genes-and-pathways/pathway-central/ Pathway Central]</ref> tools. Public online tools can provide pre-compiled and ready-to-go menus of pathways and [[Interactome|networks]] from different open sources (e.g. [https://www.evinet.org/ EviNet]).


===Methods and software===
===Methods and software===


Pathway analysis software can be generally divided into web-based applications, desktop programs and programming packages. Programming packages are mostly coded in the [[R (programming language)|R]] and [[Python (programming language)|Python]] languages, and are shared openly through the BioConductor <ref>{{cite journal | last1 = Gentleman | first1 = R. C. | last2 = Carey | first2 = V. J. | last3 = Bates | first3 = D. M. | last4 = Bolstad | first4 = B. | last5 = Dettling | first5 = M. | last6 = Dudoit | first6 = S.|author6-link=Sandrine Dudoit |display-authors=etal | year = 2004 | title = Bioconductor: open software development for computational biology and bioinformatics | journal = Genome Biol | volume = 5 | issue = 10| page = R80 | doi = 10.1186/gb-2004-5-10-r80 | pmid=15461798 | pmc=545600}}</ref> and GitHub <ref>Dabbish, L., Stuart, C., Tsay, J., and Herbsleb, J. (2012). "[https://www.cs.cmu.edu/afs/cs/Web/People/xia/resources/Documents/cscw2012_Github-paper-FinalVersion-1.pdf Social coding in github: transparency and collaboration in an open software repository]," in Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (New York, NY: ACM), 1277–1286</ref> projects. Different methods of pathway analysis evolve fast, so classification of these methods is still discussable.<ref>Khatri P., Sirota M., Butte A. J. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)</ref><ref>Henderson-Maclennan NK, Papp JC, Talbot CC, McCabe ERB, Presson AP. Pathway analysis software: annotation errors and solutions. Mol Genet Metab. 2010 Nov;101(2–3):134–40</ref> There are 3 main groups of methods in pathway analysis that can be applied to any high-throughput data:<ref>Khatri P., Sirota M., Butte A. J. Ten years of pathway analysis: current approaches and outstanding challenges. Plos Comput Biol. 2012;8(2)</ref> ORA, FCS and PT.
Pathway analysis software can be found in the form of desktop programs, web-based applications, or packages coded in such languages as [[R (programming language)|R]] and [[Python (programming language)|Python]] and shared openly through the [[BioConductor]]<ref>{{cite journal | vauthors = Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J | title = Bioconductor: open software development for computational biology and bioinformatics | journal = Genome Biology | volume = 5 | issue = 10 | pages = R80 | year = 2004 | pmid = 15461798 | pmc = 545600 | doi = 10.1186/gb-2004-5-10-r80 | author6-link = Sandrine Dudoit | doi-access = free }}</ref> and [[GitHub]]<ref>{{cite book | vauthors = Dabbish L, Stuart C, Tsay J, Herbsleb J | chapter = Social coding in GitHub: transparency and collaboration in an open software repository. | title = Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work | date = February 2012 | pages = 1277–1286 | chapter-url = https://www.cs.cmu.edu/afs/cs/Web/People/xia/resources/Documents/cscw2012_Github-paper-FinalVersion-1.pdf | publisher = Association for Computing Machinery | location = New York | doi = 10.1145/2145204.21453 | doi-broken-date = 1 November 2024 }}</ref> projects. The methodology of pathway analysis evolves fast and the classification is still discussable,<ref name = "Khatri_2012" /><ref name="Henderson-Maclennan_2010">{{cite journal | vauthors = Henderson-Maclennan NK, Papp JC, Talbot CC, McCabe ER, Presson AP | title = Pathway analysis software: annotation errors and solutions | journal = Molecular Genetics and Metabolism | volume = 101 | issue = 2–3 | pages = 134–40 | date = 2010 | pmid = 20663702 | pmc = 2950253 | doi = 10.1016/j.ymgme.2010.06.005 }}</ref> with the following main categories of pathway enrichment analysis applicable to high-throughput data:<ref name="Khatri_2012" />


====Over-representation Analysis (ORA)====
====Over-representation analysis (ORA)====


This method measures the overlap between, on the one hand, a set of genes (or proteins) in a pathway or another functionally characterised group ([[gene ontology]] (GO) groups, [[protein families]], [[Genetic pathway|pathway]]s), generally called Functional Gene Set (FGS) and, on the other hand, a set of genes altered in an experimental (or pathological) condition, generally called Altered Gene Set (AGS). A typical example of AGS is a list of top ''N'' differentially expressed genes from [[RNA-Seq]] assay. The basic assumption behind ORA is that a biologically relevant pathway can be identified by excess of AGS genes in it compared to the number expected by chance. The aim of ORA is to identify such enriched pathways, judging by [[statistical significance]] of the overlap between FGS and AGS as determined either by an appropriate statistic, such as [[Jaccard index]] or by a statistical test producing p-values ([[Fisher's exact test]] or the test using [[hypergeometric distribution]]).
This method measures the overlap between, on the one hand, a set of genes (or proteins) in an {{abbr|FGS|Functional Gene Set}} and, on the other hand, a list of most altered genes generally called Altered Gene Sets (AGS). A typical AGS example is a list of top ''N'' differentially expressed genes from an [[RNA-Seq]] assay. The basic assumption behind ORA is that a biologically relevant pathway can be identified by excess of {{abbr|AGS|Altered Gene Set}} genes in it compared to the number expected by chance. The aim of ORA is to identify such enriched pathways, judging by [[statistical significance]] of the overlap between FGS and AGS as determined either by an appropriate statistic, such as [[Jaccard index]] or by a statistical test producing p-values ([[Fisher's exact test]] or the test using [[hypergeometric distribution]]).


====Functional Class Scoring (FCS)====
====Functional class scoring (FCS)====


This method identifies {{abbr|FGS|Functional Gene Set}} by considering their relative positions in the full list of genes studied in the experiment. This full list should be therefore ranked in advance by a statistic (such as [[mRNA]] expression fold-change, [[Student's t-test]] etc.) or a p-value - while watching the direction of fold change, since p-values are non-directional. Thus FCS takes into account every FGS gene regardless of its statistical significance and does not require pre-compiled {{abbr|AGS|Altered Gene Set}}. One of the first and most popular methods deploying the FCS approach was the Gene Set Enrichment Analysis (GSEA).<ref>{{cite journal | last1 = Subramanian | first1 = Aravind | last2 = Tamayo | first2 = Pablo | last3 = Mootha | first3 = Vamsi K. | last4 = Mukherjee | first4 = Sayan | last5 = Ebert | first5 = Benjamin L. | last6 = Gillette | first6 = Michael A. | last7 = Paulovich | first7 = Amanda |display-authors=etal | year = 2005 | title = Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 102 | issue = 43| pages = 15545–50 | doi = 10.1073/pnas.0506580102 | pmid=16199517 | pmc=1239896 | bibcode = 2005PNAS..10215545S}}</ref>
This method identifies {{abbr|FGS|Functional Gene Set}} by considering their relative positions in the full list of genes studied in the experiment. This full list should be therefore ranked in advance by a statistic (such as [[mRNA]] expression fold-change, [[Student's t-test]] etc.) or a p-value - while watching the direction of fold change, since p-values are non-directional. Thus FCS takes into account every FGS gene regardless of its statistical significance and does not require pre-compiled {{abbr|AGS|Altered Gene Set}}. One of the first and most popular methods deploying the FCS approach was the Gene Set Enrichment Analysis (GSEA).<ref name="Subramanian" />


====Pathway Topology Analysis (PTA)====
====Pathway topology analysis (PTA)====


Similarly to {{abbr|FCS|Functional Class Scoring}}, PTA accounts for high-throughput data for every {{abbr|FGS|Functional Gene Set}} gene <ref>{{cite journal | last1 = Emmert-Streib | first1 = F. | last2 = Dehmer | first2 = M. | year = 2011 | title = Networks for systems biology: conceptual connection of data and function | url = | journal = IET Systems Biology| volume = 5 | issue = 3| pages = 185–207 | doi = 10.1049/iet-syb.2010.0025 | pmid = 21639592 }}</ref>.
Similarly to {{abbr|FCS|Functional Class Scoring}}, PTA accounts for high-throughput data for every {{abbr|FGS|Functional Gene Set}} gene.<ref>{{cite journal | vauthors = Emmert-Streib F, Dehmer M | title = Networks for systems biology: conceptual connection of data and function | journal = IET Systems Biology | volume = 5 | issue = 3 | pages = 185–207 | date = May 2011 | pmid = 21639592 | doi = 10.1049/iet-syb.2010.0025 }}</ref>
In addition, specific topological information is used about role, position, and interaction directions of the pathway genes. This requires additional input data from a pathway database in a pre-specified format, such as KEGG Markup Language ([https://www.genome.jp/kegg/xml/ KGML]). Using this information, PTA estimates a pathway significance by considering how much each individual gene alteration might have affected the whole pathway. Multiple alteration types can be used in parallel ([[Copy-number variation]], [[somatic mutation]] etc.) when available. <ref>{{cite journal|last1=Khatri|first1=Purvesh|last2=Sirota|first2=Marina|last3=Butte|first3=Atul J.|last4=Ouzounis|first4=Christos A.|title=Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges|journal=PLoS Computational Biology|date=23 February 2012|volume=8|issue=2|pages=e1002375|doi=10.1371/journal.pcbi.1002375|pmid=22383865|pmc=3285573|bibcode=2012PLSCB...8E2375K}}</ref> The set of PTA methods includes Signaling Pathway Impact Analysis (SPIA),<ref>{{cite journal|last1=Draghici|first1=S.|last2=Khatri|first2=P.|last3=Tarca|first3=A. L.|last4=Amin|first4=K.|last5=Done|first5=A.|last6=Voichita|first6=C.|last7=Georgescu|first7=C.|last8=Romero|first8=R.|title=A systems biology approach for pathway level analysis|journal=Genome Research|date=4 September 2007|volume=17|issue=10|pages=1537–1545|doi=10.1101/gr.6202607|pmid=17785539|pmc=1987343}}</ref><ref>{{cite journal|last1=Tarca|first1=A. L.|last2=Draghici|first2=S.|last3=Khatri|first3=P.|last4=Hassan|first4=S. S.|last5=Mittal|first5=P.|last6=Kim|first6=J.-s.|last7=Kim|first7=C. J.|last8=Kusanovic|first8=J. P.|last9=Romero|first9=R.|title=A novel signaling pathway impact analysis|journal=Bioinformatics|date=5 November 2008|volume=25|issue=1|pages=75–82|doi=10.1093/bioinformatics/btn577|pmid=18990722|pmc=2732297}}</ref> EnrichNet,<ref>{{cite journal|last1=Glaab|first1=E.|last2=Baudot|first2=A.|last3=Krasnogor|first3=N.|last4=Schneider|first4=R. S.|last5=Valencia|first5=A.|title=EnrichNet: Network-based gene set enrichment analysis|journal=Bioinformatics|date=15 September 2012|volume=28|issue=18|pages=i451–i457|doi=10.1093/bioinformatics/bts389|pmid=22962466|pmc=3436816}}</ref> GGEA,<ref>{{cite journal|last1=Geistlinger|first1=L.|last2=Csaba|first2=G.|last3=Küffner|first3=R.|last4=Mulder|first4=N.|last5=Zimmer|first5=R.|title=From sets to graphs: Towards a realistic enrichment analysis of transcriptomic systems|journal=Bioinformatics|date=2011|volume=27|issue=13|pages=i366–i373|pmid=21685094|pmc=3117393|doi=10.1093/bioinformatics/btr228}}</ref> and TopoGSA.<ref>{{cite journal|last1=Glaab|first1=E.|last2=Baudot|first2=A.|last3=Krasnogor|first3=N.|last4=Valencia|first4=A.|title=TopoGSA: Network topological gene set analysis|journal=Bioinformatics|date=2012|volume=26|issue=18|pages=1271–1272|doi=10.1093/bioinformatics/btq131|pmid=20335277|pmc=2859135}}</ref>
In addition, specific topological information is used about role, position, and interaction directions of the pathway genes. This requires additional input data from a pathway database in a pre-specified format, such as KEGG Markup Language ([https://www.genome.jp/kegg/xml/ KGML]). Using this information, PTA estimates a pathway significance by considering how much each individual gene alteration might have affected the whole pathway. Multiple alteration types can be used in parallel (somatic [[copy-number variations]], [[point mutations]] etc.) when available.<ref name = "Khatri_2012">{{cite journal | vauthors = Khatri P, Sirota M, Butte AJ | title = Ten years of pathway analysis: current approaches and outstanding challenges | journal = PLOS Computational Biology | volume = 8 | issue = 2 | pages = e1002375 | date = 23 February 2012 | pmid = 22383865 | pmc = 3285573 | doi = 10.1371/journal.pcbi.1002375 | bibcode = 2012PLSCB...8E2375K | doi-access = free }}</ref> The set of PTA methods includes the Impact Analysis,<ref name="Draghici">{{cite journal | vauthors = Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, Georgescu C, Romero R | title = A systems biology approach for pathway level analysis | journal = Genome Research | volume = 17 | issue = 10 | pages = 1537–45 | date = October 2007 | pmid = 17785539 | pmc = 1987343 | doi = 10.1101/gr.6202607 }}</ref><ref name="Tarca">{{cite journal | vauthors = Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, Kim CJ, Kusanovic JP, Romero R | title = A novel signaling pathway impact analysis | journal = Bioinformatics | volume = 25 | issue = 1 | pages = 75–82 | date = January 2009 | pmid = 18990722 | pmc = 2732297 | doi = 10.1093/bioinformatics/btn577 }}</ref> EnrichNet,<ref>{{cite journal | vauthors = Glaab E, Baudot A, Krasnogor N, Schneider R, Valencia A | title = EnrichNet: network-based gene set enrichment analysis | journal = Bioinformatics | volume = 28 | issue = 18 | pages = i451–i457 | date = September 2012 | pmid = 22962466 | pmc = 3436816 | doi = 10.1093/bioinformatics/bts389 }}</ref> GGEA,<ref>{{cite journal | vauthors = Geistlinger L, Csaba G, Küffner R, Mulder N, Zimmer R | title = From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems | journal = Bioinformatics | volume = 27 | issue = 13 | pages = i366-73 | date = July 2011 | pmid = 21685094 | pmc = 3117393 | doi = 10.1093/bioinformatics/btr228 }}</ref> and TopoGSA.<ref>{{cite journal | vauthors = Glaab E, Baudot A, Krasnogor N, Valencia A | title = TopoGSA: network topological gene set analysis | journal = Bioinformatics | volume = 26 | issue = 9 | pages = 1271–2 | date = May 2010 | pmid = 20335277 | pmc = 2859135 | doi = 10.1093/bioinformatics/btq131 }}</ref>


====Network enrichment analysis (NEA)====
==Notable companies==


Network enrichment analysis (NEA) has been an extension of gene-set enrichment analysis to the domain of [[Interactome|global gene networks]]<ref>{{cite journal | vauthors = Shojaie A, Michailidis G | title = Network enrichment analysis in complex experiments | journal = Statistical Applications in Genetics and Molecular Biology | volume = 9 | issue = 1 | pages = Article22 | date = 22 May 2010 | pmid = 20597848 | pmc = 2898649 | doi = 10.2202/1544-6115.1483 }}</ref><ref>{{cite journal | vauthors = Huttenhower C, Haley EM, Hibbs MA, Dumeaux V, Barrett DR, Coller HA, Troyanskaya OG | title = Exploring the human genome with functional maps | journal = Genome Research | volume = 19 | issue = 6 | pages = 1093–106 | date = June 2009 | pmid = 19246570 | doi = 10.1101/gr.082214.108 | pmc = 2694471 | doi-access = free }}</ref><ref>{{cite journal | vauthors = Alexeyenko A, Lee W, Pernemalm M, Guegan J, Dessen P, Lazar V, Lehtiö J, Pawitan Y | title = Network enrichment analysis: extension of gene-set enrichment analysis to gene networks | journal = BMC Bioinformatics | volume = 13 | pages = 226 | date = September 2012 | pmid = 22966941 | pmc = 3505158 | doi = 10.1186/1471-2105-13-226 | doi-access = free }}</ref><ref>{{cite journal | vauthors = Signorelli M, Vinciotti V, Wit EC | title = NEAT: an efficient network enrichment analysis test | journal = BMC Bioinformatics | volume = 17 | issue = 1 | pages = 352 | date = September 2016 | pmid = 27597310 | doi = 10.1186/s12859-016-1203-6 | pmc = 5011912 | arxiv = 1604.01210 | s2cid = 2274758 | doi-access = free }}</ref> The major principle of NEA can be understood in comparison with {{abbr|ORA| Over-representation Analysis}}, where enrichment of {{abbr|FGS|Functional Gene Set}} in genes of the {{abbr|AGS|Altered Gene Set}} is determined by how many genes are directly shared by AGS and FGS. In NEA, on the contrary, the global network is searched for network edges that connect any genes of AGS with any genes of FGS. Since enrichment significance is influenced by the highly variable node degrees of individual AGS and FGS genes, it should be determined by a dedicated statistical test, which compares the observed number of network edges to the number expected by chance in the same network context. Some valuable properties of NEA are that:
Several companies have licensed software to perform a number of analytic methods on gene set. Most of free software solutions provide only links to online pathway collections; rather commercial ones have their own collections. The choice of best software depends on user skills, cost and time which one could spend on pathways analysis.<ref>{{cite journal|last1=García-Campos|first1=Miguel Angel|last2=Espinal-Enríquez|first2=Jesús|last3=Hernández-Lemus|first3=Enrique|title=Pathway analysis: State of the art|journal=Frontiers in Physiology|date=2015|volume=6|pages=383|doi=10.3389/fphys.2015.00383|pmid=26733877|pmc=4681784}}</ref> [[Ingenuity]], for example, charges a fee for use of their software. Some software, like [[STRING]] or [[Cytoscape]] are an open-source. However, Ingenuity maintains a knowledge base to compare gene expression data to.<ref>"Ingenuity IPA - Integrate and Understand Complex 'omics Data." Ingenuity. Web. 8 Apr. 2015. <http://www.ingenuity.com/products/ipa#/?tab=features>.</ref> [[Pathways Studio]] <ref>[http://www.pathwaystudio.com/ Pathway Studio]</ref> is commercial software which allows to search biologically relevant facts, analyze experiments and create pathways. Pathways Studio Viewer <ref>[https://mammalcedfx.pathwaystudio.com/app/search Pathway Studio Viewer]</ref> is a free resource from that company for making acquaintance with Pathway Studio interactive pathway collection and database. Only two commercial applications are known to offer pathway topology (PT) based analyses, PathwayGuide from [http://www.Advaitabio.com Advaita Corporation] and MetaCore from Thomson Reuters.<ref>{{cite journal|last1=Mitrea|first1=Cristina|last2=Taghavi|first2=Zeinab|last3=Bokanizad|first3=Behzad|last4=Hanoudi|first4=Samer|last5=Tagett|first5=Rebecca|last6=Donato|first6=Michele|last7=Voichiţa|first7=Călin|last8=Drăghici|first8=Sorin|title=Methods and approaches in the topology-based analysis of biological pathways|journal=Frontiers in Physiology|date=2013|volume=4|pages=278|doi=10.3389/fphys.2013.00278|pmid=24133454|pmc=3794382}}</ref> Advaita uses the peer reviewed Signaling Pathway Impact Analysis (SPIA) method<ref>{{cite journal|last1=Draghici|first1=S.|last2=Khatri|first2=P.|last3=Tarca|first3=A. L.|last4=Amin|first4=K.|last5=Done|first5=A.|last6=Voichita|first6=C.|last7=Georgescu|first7=C.|last8=Romero|first8=R.|title=A systems biology approach for pathway level analysis|journal=Genome Research|date=4 September 2007|volume=17|issue=10|pages=1537–1545|doi=10.1101/gr.6202607|pmid=17785539|pmc=1987343}}</ref><ref>{{cite journal|last1=Tarca|first1=A. L.|last2=Draghici|first2=S.|last3=Khatri|first3=P.|last4=Hassan|first4=S. S.|last5=Mittal|first5=P.|last6=Kim|first6=J.-s.|last7=Kim|first7=C. J.|last8=Kusanovic|first8=J. P.|last9=Romero|first9=R.|title=A novel signaling pathway impact analysis|journal=Bioinformatics|date=5 November 2008|volume=25|issue=1|pages=75–82|doi=10.1093/bioinformatics/btn577|pmid=18990722|pmc=2732297}}</ref> while the MetaCore method is unpublished.<ref>{{cite journal|last1=Mitrea|first1=Cristina|last2=Taghavi|first2=Zeinab|last3=Bokanizad|first3=Behzad|last4=Hanoudi|first4=Samer|last5=Tagett|first5=Rebecca|last6=Donato|first6=Michele|last7=Voichiţa|first7=Călin|last8=Drăghici|first8=Sorin|title=Methods and approaches in the topology-based analysis of biological pathways|journal=Frontiers in Physiology|date=2013|volume=4|pages=278|doi=10.3389/fphys.2013.00278|pmid=24133454|pmc=3794382}}</ref>
# it is more robust to biological and technical variability between sample replicates;<ref name="Integration of somatic mutation, ex"/><ref>{{cite journal | vauthors = Jeggari A, Alexeyenko A | title = NEArender: an R package for functional interpretation of 'omics' data via network enrichment analysis | journal = BMC Bioinformatics | volume = 18 | issue = Suppl 5 | pages = 118 | date = March 2017 | pmid = 28361684 | pmc = 5374688 | doi = 10.1186/s12859-017-1534-y | doi-access = free }}</ref>
# {{abbr|AGS|Altered Gene Set}} genes may not necessarily be annotated as pathway members;<ref>{{cite journal | vauthors = Hong MG, Alexeyenko A, Lambert JC, Amouyel P, Prince JA | title = Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease | journal = Journal of Human Genetics | volume = 55 | issue = 10 | pages = 707–9 | date = October 2010 | pmid = 20668461 | doi = 10.1038/jhg.2010.92 | s2cid = 27020289 | doi-access = free }}</ref>
# {{abbr|FGS|Functional Gene Set}} members do not have to be altered themselves, but still are accounted for due to possessing network links to AGS genes.<ref>{{cite journal | vauthors = Jeggari A, Alekseenko Z, Petrov I, Dias JM, Ericson J, Alexeyenko A | title = EviNet: a web platform for network enrichment analysis with flexible definition of gene sets | journal = Nucleic Acids Research | volume = 46 | issue = W1 | pages = W163–W170 | date = July 2018 | pmid = 29893885 | pmc = 6030852 | doi = 10.1093/nar/gky485 }}</ref>


==Commercial solutions==
==Limits==


Beyond open-source tools, such as [[STRING]] or [[Cytoscape]], a number of companies sell licensed software products to analyse gene sets. While most of the publicly available solutions use online and public pathway collections, the commercial products mostly promote own, proprietary pathways and networks. The choice of such products might be driven by customers' skills, financial and time resources, and needs.<ref name="Garcia-Campos" /> [[Ingenuity Systems|Ingenuity]], for example, maintains a knowledge base for comparative analysis of gene expression data.<ref>{{cite web | title = Ingenuity IPA - Integrate and Understand Complex 'omics Data. | work = Ingenuity | date = 8 April 2015 | url = http://www.ingenuity.com/products/ipa#/?tab=features }}</ref> [[Pathways Studio]]<ref>[http://www.pathwaystudio.com/ Pathway Studio]</ref> is commercial software which allows searching for biologically relevant facts, analyze experiments, and create pathways. Pathways Studio Viewer<ref>[https://mammalcedfx.pathwaystudio.com/app/search Pathway Studio Viewer]</ref> is a free resource from the same company for presenting the Pathway Studio interactive pathway collection and database. Two commercial solutions offer {{abbr|PTA|Pathway Topology Analysis}}: iPathwayGuide from [http://www.Advaitabio.com Advaita Corporation] and MetaCore from Thomson Reuters.<ref name="Mitrea">{{cite journal | vauthors = Mitrea C, Taghavi Z, Bokanizad B, Hanoudi S, Tagett R, Donato M, Voichiţa C, Drăghici S | title = Methods and approaches in the topology-based analysis of biological pathways | journal = Frontiers in Physiology | volume = 4 | pages = 278 | date = October 2013 | pmid = 24133454 | pmc = 3794382 | doi = 10.3389/fphys.2013.00278 | doi-access = free }}</ref> Advaita uses the peer reviewed Impact Analysis method<ref name="Draghici" /><ref name="Tarca" /> while the MetaCore method is unpublished.<ref name="Mitrea" /> [https://www.illumina.com/products/by-type/informatics-products/connected-analytics/modules/correlation-engine.html Correlation Engine] uses the Running Fisher algorithm for gene set enrichment within its Pathway Enrichment application.<ref>{{cite journal | vauthors = Kupershmidt I, Su QJ, Grewal A, Sundaresh S, Halperin I, Flynn J, Shekar M, Wang H, Park J, Cui W, Wall GD, Wisotzkey R, Alag S, Akhtari S, Ronaghi M | title = Ontology-based meta-analysis of global collections of high-throughput public data | journal = PLOS ONE | volume = 5 | issue = 9 | pages = e13066 | date = September 2010 | pmid = 20927376 | pmc = 2947508 | doi = 10.1371/journal.pone.0013066 | doi-access = free | bibcode = 2010PLoSO...513066K | veditors = Aziz RK }}</ref>
===Missing annotations on cell types and conditions===


==Limitations==
Many current methods for pathway analysis depend on existing [[databases]]. The data used, however, is not always completely annotated. Many genes interactions in databases are relatively speculative as they are based on scientific facts, are pulled from a specific cell type or disease. Also most canonical pathways are built using the knowledge obtained from a limited number of experiments with narrow cell models. Therefore, interpretation of results of pathway analysis of [[omics]] data obtained from different tissues should be done with caution.<ref>Henderson-Maclennan, Nicole K., Jeanette C. Papp, C. Conover Talbot, Edward R. B. McCabe, and Angela P. Presson. "Pathway Analysis Software: Annotation Errors and Solutions."Molecular Genetics and Metabolism (2010): 134–40. PMC. Web. 8 April 2015.</ref>


===Lack of annotations===
==References==

<references/>
Application of pathway analysis methods depends on annotations found in existing [[databases]], such as gene set membership in pathways, pathway topology, presence of genes in the global network etc. These annotations, however, are far from being complete and have highly variable degrees of confidence. In addition, such information is usually general, i.e. deprived of e.g. cell type, compartment, or developmental context. Therefore, interpretation of pathway analysis results for [[omics]] datasets should be done with caution<ref name="Henderson-Maclennan_2010" /> Partially, the problem can be addressed by analysing larger gene sets in a more, such as big pathway collections or global interaction networks.<ref name="pmid30787419">{{cite journal | vauthors = Franco M, Jeggari A, Peuget S, Böttger F, Selivanova G, Alexeyenko A | title = Prediction of response to anti-cancer drugs becomes robust via network integration of molecular data | journal = Scientific Reports | volume = 9 | issue = 1 | pages = 2379 | date = February 2019 | pmid = 30787419 | pmc = 6382934 | doi = 10.1038/s41598-019-39019-2 | bibcode = 2019NatSR...9.2379F }}</ref>
==See also==
*[[Biological pathway]]

== References ==
{{reflist}}


[[Category:Bioinformatics software]]
[[Category:Bioinformatics software]]

Latest revision as of 16:21, 7 December 2024

Pathway resources and types of pathway analysis using databases like KEGG, Reactome and WikiPathways.[1]

Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions.[2] A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain,[3] complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation.[4][5] In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental (or pathological) condition that was studied with omics tools or genome-wide association study.[6] Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions (with a large gene fraction lacking any annotation). In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.

Use

[edit]

The data for pathway analysis come from high throughput biology. This includes high throughput sequencing data and microarray data. Before pathway analysis can be done, each gene's alteration should be evaluated using the omics dataset in either quantitative (differential expression analysis) or qualitative (detection of somatic point mutations or mapping neighbor genes to a disease-associated SNP). It is also possible to combine datasets from different research groups or multiple omics platform with a meta-analysis and cross-platform regularization.[7][8] Further, a list where gene identifiers are accompanied by the alteration attributes is subjected to a pathway analysis. By using pathway analysis software, researchers can determine which FGSs are enriched with the altered experimental genes[9][10] For example, pathway analysis of several independent microarray experiments (meta-analysis) helped to discover potential biomarkers in a single pathway important for fast-to-slow switch fiber type transition in Duchenne muscular dystrophy.[11] In another study meta-analysis identified two biomarkers in blood of patients with Parkinson's disease, which can be useful for monitoring the disease.[12] Candidate gene alleles causative of Alzheimer's disease and elderly dementia where first discovered via genome-wide association study and further validated with network enrichment analysis against FGS consisting of known Alzheimer's genes.[13][14]

Databases

[edit]

Pathway collections and interaction networks constitute the knowledge base required for a pathway analysis. Pathway content, structure, format, and functionality vary between different database resources such as KEGG,[15] WikiPathways, or Reactome.[16] Also exist proprietary pathways collections used by e.g. Pathway Studio[17] and Ingenuity Pathway Analysis[18] tools. Public online tools can provide pre-compiled and ready-to-go menus of pathways and networks from different open sources (e.g. EviNet).

Methods and software

[edit]

Pathway analysis software can be found in the form of desktop programs, web-based applications, or packages coded in such languages as R and Python and shared openly through the BioConductor[19] and GitHub[20] projects. The methodology of pathway analysis evolves fast and the classification is still discussable,[21][22] with the following main categories of pathway enrichment analysis applicable to high-throughput data:[21]

Over-representation analysis (ORA)

[edit]

This method measures the overlap between, on the one hand, a set of genes (or proteins) in an FGS and, on the other hand, a list of most altered genes generally called Altered Gene Sets (AGS). A typical AGS example is a list of top N differentially expressed genes from an RNA-Seq assay. The basic assumption behind ORA is that a biologically relevant pathway can be identified by excess of AGS genes in it compared to the number expected by chance. The aim of ORA is to identify such enriched pathways, judging by statistical significance of the overlap between FGS and AGS as determined either by an appropriate statistic, such as Jaccard index or by a statistical test producing p-values (Fisher's exact test or the test using hypergeometric distribution).

Functional class scoring (FCS)

[edit]

This method identifies FGS by considering their relative positions in the full list of genes studied in the experiment. This full list should be therefore ranked in advance by a statistic (such as mRNA expression fold-change, Student's t-test etc.) or a p-value - while watching the direction of fold change, since p-values are non-directional. Thus FCS takes into account every FGS gene regardless of its statistical significance and does not require pre-compiled AGS. One of the first and most popular methods deploying the FCS approach was the Gene Set Enrichment Analysis (GSEA).[10]

Pathway topology analysis (PTA)

[edit]

Similarly to FCS, PTA accounts for high-throughput data for every FGS gene.[23] In addition, specific topological information is used about role, position, and interaction directions of the pathway genes. This requires additional input data from a pathway database in a pre-specified format, such as KEGG Markup Language (KGML). Using this information, PTA estimates a pathway significance by considering how much each individual gene alteration might have affected the whole pathway. Multiple alteration types can be used in parallel (somatic copy-number variations, point mutations etc.) when available.[21] The set of PTA methods includes the Impact Analysis,[24][25] EnrichNet,[26] GGEA,[27] and TopoGSA.[28]

Network enrichment analysis (NEA)

[edit]

Network enrichment analysis (NEA) has been an extension of gene-set enrichment analysis to the domain of global gene networks[29][30][31][32] The major principle of NEA can be understood in comparison with ORA, where enrichment of FGS in genes of the AGS is determined by how many genes are directly shared by AGS and FGS. In NEA, on the contrary, the global network is searched for network edges that connect any genes of AGS with any genes of FGS. Since enrichment significance is influenced by the highly variable node degrees of individual AGS and FGS genes, it should be determined by a dedicated statistical test, which compares the observed number of network edges to the number expected by chance in the same network context. Some valuable properties of NEA are that:

  1. it is more robust to biological and technical variability between sample replicates;[8][33]
  2. AGS genes may not necessarily be annotated as pathway members;[34]
  3. FGS members do not have to be altered themselves, but still are accounted for due to possessing network links to AGS genes.[35]

Commercial solutions

[edit]

Beyond open-source tools, such as STRING or Cytoscape, a number of companies sell licensed software products to analyse gene sets. While most of the publicly available solutions use online and public pathway collections, the commercial products mostly promote own, proprietary pathways and networks. The choice of such products might be driven by customers' skills, financial and time resources, and needs.[6] Ingenuity, for example, maintains a knowledge base for comparative analysis of gene expression data.[36] Pathways Studio[37] is commercial software which allows searching for biologically relevant facts, analyze experiments, and create pathways. Pathways Studio Viewer[38] is a free resource from the same company for presenting the Pathway Studio interactive pathway collection and database. Two commercial solutions offer PTA: iPathwayGuide from Advaita Corporation and MetaCore from Thomson Reuters.[39] Advaita uses the peer reviewed Impact Analysis method[24][25] while the MetaCore method is unpublished.[39] Correlation Engine uses the Running Fisher algorithm for gene set enrichment within its Pathway Enrichment application.[40]

Limitations

[edit]

Lack of annotations

[edit]

Application of pathway analysis methods depends on annotations found in existing databases, such as gene set membership in pathways, pathway topology, presence of genes in the global network etc. These annotations, however, are far from being complete and have highly variable degrees of confidence. In addition, such information is usually general, i.e. deprived of e.g. cell type, compartment, or developmental context. Therefore, interpretation of pathway analysis results for omics datasets should be done with caution[22] Partially, the problem can be addressed by analysing larger gene sets in a more, such as big pathway collections or global interaction networks.[41]

See also

[edit]

References

[edit]
  1. ^ Mubeen S, Hoyt CT, Gemünd A, Hofmann-Apitius M, Fröhlich H, Domingo-Fernández D (2019). "The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling". Frontiers in Genetics. 10: 1203. doi:10.3389/fgene.2019.01203. PMC 6883970. PMID 31824580.
  2. ^ Berg JM, Tymoczko JL, Stryer L (2002). Biochemistry (5th ed.). New York: W.H. Freeman. ISBN 978-0-7167-3051-4.
  3. ^ Ohlrogge J, Browse J (July 1995). "Lipid biosynthesis". The Plant Cell. 7 (7): 957–70. doi:10.1105/tpc.7.7.957. PMC 160893. PMID 7640528. S2CID 219201001.
  4. ^ "Main Page - SBML.caltech.edu". sbml.org.
  5. ^ "KGML (KEGG Markup Language)". www.genome.jp.
  6. ^ a b García-Campos MA, Espinal-Enríquez J, Hernández-Lemus E (2015). "Pathway Analysis: State of the Art". Frontiers in Physiology. 6: 383. doi:10.3389/fphys.2015.00383. PMC 4681784. PMID 26733877.
  7. ^ Walsh CJ, Hu P, Batt J, Santos CC (August 2015). "Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery". Microarrays. 4 (3): 389–406. doi:10.3390/microarrays4030389. PMC 4996376. PMID 27600230.
  8. ^ a b Suo C, Hrydziuszko O, Lee D, Pramana S, Saputra D, Joshi H, et al. (August 2015). "Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival". Bioinformatics. 31 (16): 2607–13. doi:10.1093/bioinformatics/btv164. PMID 25810432.
  9. ^ Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM (July 1999). "Systematic determination of genetic network architecture". Nature Genetics. 22 (3): 281–5. doi:10.1038/10343. PMID 10391217. S2CID 14688842.
  10. ^ a b Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. (October 2005). "Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–50. Bibcode:2005PNAS..10215545S. doi:10.1073/pnas.0506580102. PMC 1239896. PMID 16199517.
  11. ^ Kotelnikova E, Shkrob MA, Pyatnitskiy MA, Ferlini A, Daraselia N (February 2012). "Novel approach to meta-analysis of microarray datasets reveals muscle remodeling-related drug targets and biomarkers in Duchenne muscular dystrophy". PLOS Computational Biology. 8 (2): e1002365. Bibcode:2012PLSCB...8E2365K. doi:10.1371/journal.pcbi.1002365. PMC 3271016. PMID 22319435.
  12. ^ Santiago JA, Potashkin JA (February 2015). "Network-based metaanalysis identifies HNF4A and PTBP1 as longitudinally dynamic biomarkers for Parkinson's disease". Proceedings of the National Academy of Sciences of the United States of America. 112 (7): 2257–62. Bibcode:2015PNAS..112.2257S. doi:10.1073/pnas.1423573112. PMC 4343174. PMID 25646437.
  13. ^ Reynolds CA, Hong MG, Eriksson UK, Blennow K, Wiklund F, Johansson B, et al. (May 2010). "Analysis of lipid pathway genes indicates association of sequence variation near SREBF1/TOM1L2/ATPAF2 with dementia risk". Human Molecular Genetics. 19 (10): 2068–78. doi:10.1093/hmg/ddq079. PMC 2860895. PMID 20167577.
  14. ^ Bennet AM, Reynolds CA, Eriksson UK, Hong MG, Blennow K, Gatz M, et al. (1 January 2011). "Genetic association of sequence variants near AGER/NOTCH4 and dementia". Journal of Alzheimer's Disease. 24 (3): 475–84. doi:10.3233/jad-2011-101848. PMC 3477600. PMID 21297263.
  15. ^ Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (January 1999). "KEGG: Kyoto Encyclopedia of Genes and Genomes". Nucleic Acids Research. 27 (1): 29–34. doi:10.1093/nar/27.1.29. PMC 148090. PMID 9847135.
  16. ^ Vastrik I, D'Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D, et al. (2007). "Reactome: a knowledge base of biologic pathways and processes". Genome Biology. 8 (3): R39. doi:10.1186/gb-2007-8-3-r39. PMC 1868929. PMID 17367534.
  17. ^ Pathway Studio Pathways
  18. ^ Pathway Central
  19. ^ Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. (2004). "Bioconductor: open software development for computational biology and bioinformatics". Genome Biology. 5 (10): R80. doi:10.1186/gb-2004-5-10-r80. PMC 545600. PMID 15461798.
  20. ^ Dabbish L, Stuart C, Tsay J, Herbsleb J (February 2012). "Social coding in GitHub: transparency and collaboration in an open software repository." (PDF). Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. New York: Association for Computing Machinery. pp. 1277–1286. doi:10.1145/2145204.21453 (inactive 1 November 2024).{{cite book}}: CS1 maint: DOI inactive as of November 2024 (link)
  21. ^ a b c Khatri P, Sirota M, Butte AJ (23 February 2012). "Ten years of pathway analysis: current approaches and outstanding challenges". PLOS Computational Biology. 8 (2): e1002375. Bibcode:2012PLSCB...8E2375K. doi:10.1371/journal.pcbi.1002375. PMC 3285573. PMID 22383865.
  22. ^ a b Henderson-Maclennan NK, Papp JC, Talbot CC, McCabe ER, Presson AP (2010). "Pathway analysis software: annotation errors and solutions". Molecular Genetics and Metabolism. 101 (2–3): 134–40. doi:10.1016/j.ymgme.2010.06.005. PMC 2950253. PMID 20663702.
  23. ^ Emmert-Streib F, Dehmer M (May 2011). "Networks for systems biology: conceptual connection of data and function". IET Systems Biology. 5 (3): 185–207. doi:10.1049/iet-syb.2010.0025. PMID 21639592.
  24. ^ a b Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, et al. (October 2007). "A systems biology approach for pathway level analysis". Genome Research. 17 (10): 1537–45. doi:10.1101/gr.6202607. PMC 1987343. PMID 17785539.
  25. ^ a b Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, Kim JS, et al. (January 2009). "A novel signaling pathway impact analysis". Bioinformatics. 25 (1): 75–82. doi:10.1093/bioinformatics/btn577. PMC 2732297. PMID 18990722.
  26. ^ Glaab E, Baudot A, Krasnogor N, Schneider R, Valencia A (September 2012). "EnrichNet: network-based gene set enrichment analysis". Bioinformatics. 28 (18): i451–i457. doi:10.1093/bioinformatics/bts389. PMC 3436816. PMID 22962466.
  27. ^ Geistlinger L, Csaba G, Küffner R, Mulder N, Zimmer R (July 2011). "From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems". Bioinformatics. 27 (13): i366-73. doi:10.1093/bioinformatics/btr228. PMC 3117393. PMID 21685094.
  28. ^ Glaab E, Baudot A, Krasnogor N, Valencia A (May 2010). "TopoGSA: network topological gene set analysis". Bioinformatics. 26 (9): 1271–2. doi:10.1093/bioinformatics/btq131. PMC 2859135. PMID 20335277.
  29. ^ Shojaie A, Michailidis G (22 May 2010). "Network enrichment analysis in complex experiments". Statistical Applications in Genetics and Molecular Biology. 9 (1): Article22. doi:10.2202/1544-6115.1483. PMC 2898649. PMID 20597848.
  30. ^ Huttenhower C, Haley EM, Hibbs MA, Dumeaux V, Barrett DR, Coller HA, et al. (June 2009). "Exploring the human genome with functional maps". Genome Research. 19 (6): 1093–106. doi:10.1101/gr.082214.108. PMC 2694471. PMID 19246570.
  31. ^ Alexeyenko A, Lee W, Pernemalm M, Guegan J, Dessen P, Lazar V, et al. (September 2012). "Network enrichment analysis: extension of gene-set enrichment analysis to gene networks". BMC Bioinformatics. 13: 226. doi:10.1186/1471-2105-13-226. PMC 3505158. PMID 22966941.
  32. ^ Signorelli M, Vinciotti V, Wit EC (September 2016). "NEAT: an efficient network enrichment analysis test". BMC Bioinformatics. 17 (1): 352. arXiv:1604.01210. doi:10.1186/s12859-016-1203-6. PMC 5011912. PMID 27597310. S2CID 2274758.
  33. ^ Jeggari A, Alexeyenko A (March 2017). "NEArender: an R package for functional interpretation of 'omics' data via network enrichment analysis". BMC Bioinformatics. 18 (Suppl 5): 118. doi:10.1186/s12859-017-1534-y. PMC 5374688. PMID 28361684.
  34. ^ Hong MG, Alexeyenko A, Lambert JC, Amouyel P, Prince JA (October 2010). "Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease". Journal of Human Genetics. 55 (10): 707–9. doi:10.1038/jhg.2010.92. PMID 20668461. S2CID 27020289.
  35. ^ Jeggari A, Alekseenko Z, Petrov I, Dias JM, Ericson J, Alexeyenko A (July 2018). "EviNet: a web platform for network enrichment analysis with flexible definition of gene sets". Nucleic Acids Research. 46 (W1): W163–W170. doi:10.1093/nar/gky485. PMC 6030852. PMID 29893885.
  36. ^ "Ingenuity IPA - Integrate and Understand Complex 'omics Data". Ingenuity. 8 April 2015.
  37. ^ Pathway Studio
  38. ^ Pathway Studio Viewer
  39. ^ a b Mitrea C, Taghavi Z, Bokanizad B, Hanoudi S, Tagett R, Donato M, et al. (October 2013). "Methods and approaches in the topology-based analysis of biological pathways". Frontiers in Physiology. 4: 278. doi:10.3389/fphys.2013.00278. PMC 3794382. PMID 24133454.
  40. ^ Kupershmidt I, Su QJ, Grewal A, Sundaresh S, Halperin I, Flynn J, et al. (September 2010). Aziz RK (ed.). "Ontology-based meta-analysis of global collections of high-throughput public data". PLOS ONE. 5 (9): e13066. Bibcode:2010PLoSO...513066K. doi:10.1371/journal.pone.0013066. PMC 2947508. PMID 20927376.
  41. ^ Franco M, Jeggari A, Peuget S, Böttger F, Selivanova G, Alexeyenko A (February 2019). "Prediction of response to anti-cancer drugs becomes robust via network integration of molecular data". Scientific Reports. 9 (1): 2379. Bibcode:2019NatSR...9.2379F. doi:10.1038/s41598-019-39019-2. PMC 6382934. PMID 30787419.