Jump to content

GeneNetwork: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
fixed typos, reformatted lists
#suggestededit-add-desc 1.0
Tags: Mobile edit Mobile app edit Android app edit
 
(62 intermediate revisions by 39 users not shown)
Line 1: Line 1:
{{Short description|Database and analysis software for systems genetics}}
{{Infobox Software
{{External links|date=December 2021}}
{{Infobox software
| name = GeneNetwork
| name = GeneNetwork
| logo =
| screenshot =
| caption = GeneNetwork home page
| year = 2001 <ref>[http://www.genenetwork.org/pdf/webqtl.pdf A Brief History] R : Past and Future History, Ross Ihaka, Statistics Department, The University of Auckland, Auckland, New Zealand, available from the CRAN website</ref>
| designer = [[Kenneth F. Manly]] and [[Robert W. Williams]]
| developer = GeneNetwork Development Team, University of Tennessee
| developer = GeneNetwork Development Team, University of Tennessee
| released = {{Start date and age|1994|01|15|df=yes}}
| latest_release_version = 0.9
| latest release version = 2.0
| latest_release_date = 13 September 2010
| latest release date = {{Start date and age|2016|05|29|df=yes}}
| latest_test_version = Through [[Subversion (software)|Subversion]]
| repo = {{URL|https://github.com/genenetwork/genenetwork2}}
| latest_test_date =
| programming language = [[JavaScript]], [[HTML]], [[Python (programming language)|Python]], [[Cascading Style Sheets|CSS]], [[CoffeeScript]], [[PHP]]
| typing =
| implementations =
| dialects =
| influenced_by = Map Manager QT and QTX
| influenced =
| operating_system = [[Cross-platform]] web-based
| license = [[Affero General Public License]]
| license = [[Affero General Public License]]
| website = http://www.genenetwork.org/
| website = {{URL|www.genenetwork.org}}
}}
}}


'''GeneNetwork''' is a database and [[open source]] [[bioinformatics]] software resource for [[systems genetics]].<ref name="pmid 17534074">{{cite journal| author=Morahan G, Williams RW | title=Systems genetics: the next generation in genetics research? | journal=Novartis Found Symp | year= 2007 | volume= 281 | pages= 188-91 | pmid=17534074 | url=http://www.ncbi.nlm.nih.gov/pubmed/17534074 }} </ref> This resource is used to study [[gene regulatory network]]s that link DNA sequence variants to corresponding differences in gene and protein expression and to differences in traits such as health and disease risk. Data sets in GeneNetwork are typically are made up of large collections of genotypes (e.g., [[SNP]]s) and phenotypes that are obtained from groups of related individuals, including human families, experimental crosses of strains of mice and rats, and organisms as diverse as [[Drosophila melanogaster]], [[Arabidopsis thaliana]], and [[barley]].<ref name="pmid 19017390">{{cite journal| author=Druka A, Druka I, Centeno AG, Li H, Sun Z, Thomas WT, Bonar N, Steffenson BJ, Ullrich SE, Kleinhofs A, Wise RP, Close TJ, Potokina E, Luo Z, Wagner C, Schweizer GF, Marshall DF, Kearsey MJ, Williams RW, Waugh R. | title=Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork. | journal=BMC Genet | year= 2008 | volume= 9 | pages= 73 | pmid= 19017390 | url=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2630324/ }} </ref> The inclusion of genotypes for all individuals makes it practical to carry out web-based [[gene mapping]] to discover those regions of the genome that contribute to differences in gene expression, cell function, anatomy, physiology, and behavior among individuals.
'''GeneNetwork''' is a combined database and [[open-source software|open-source]] [[bioinformatics]] data analysis software resource for [[systems genetics]].<ref name="pmid 17534074">{{cite book |last1=Morahan |first1=G |last2=Williams |first2=RW |chapter=Systems Genetics: The Next Generation in Genetics Research? |title=Decoding the Genomic Control of Immune Reactions |journal=Novartis Foundation Symposium |series=Novartis Foundation Symposia |volume=281 |pages=181–8; discussion 188–91, 208–9 |year=2007 |pmid=17534074 |doi=10.1002/9780470062128.ch15|isbn=9780470062128 }}</ref> This resource is used to study [[gene regulatory network]]s that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes (e.g., [[single-nucleotide polymorphism|SNP]]s) and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as [[Drosophila melanogaster]], [[Arabidopsis thaliana]], and [[barley]].<ref name="pmid 19017390">{{cite journal |last1=Druka |first1=A |last2=Druka |first2=I |last3=Centeno |first3=AG |last4=Li |first4=H |last5=Sun |first5=Z |last6=Thomas |first6=WT |last7=Bonar |first7=N |last8=Steffenson |first8=BJ |last9=Ullrich |first9=SE |last10=Kleinhofs |first10=Andris |last11=Wise |first11=Roger P |last12=Close |first12=Timothy J |last13=Potokina |first13=Elena |last14=Luo |first14=Zewei |last15=Wagner |first15=Carola |last16=Schweizer |first16=Günther F |last17=Marshall |first17=David F |last18=Kearsey |first18=Michael J |last19=Williams |first19=Robert W |last20=Waugh |first20=Robbie |title=Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork |journal=BMC Genetics |volume=9 |pages=73 |year=2008 |pmid=19017390 |pmc=2630324 |doi=10.1186/1471-2156-9-73 |doi-access=free }}</ref> The inclusion of genotypes makes it practical to carry out web-based [[gene mapping]] to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.



==History==
==History==
GeneNetwork was originally created at the University of Tennessee in Memphis in 2000-2001. It was developed as a web-adapted version of Kenneth F. Manly's [http://mapmanager.org/ Map Manager] program and was initially called WebQTL.<ref name="pmid 15114364">{{cite journal| author=Chesler EJ, Lu L, Wang J, Williams RW, Manly KF | title=WebQTL: rapid exploratory analysis of gene expression and genetic networks for brain and behavior. | journal=Nat Neurosci | year= 2004 | volume= 7 | pages= 485-86 | pmid= 15114364 | url=http://www.nature.com/index.html?file=/neuro/journal/v7/n5/full/nn0504-485.html }} </ref> Gene mapping data were incorporated for several mouse [[recombinant inbred strain]]s. By early 2003, the first large [[Affymetrix]] gene expression data sets (whole mouse brain mRNA and hematopoietic stem cells) were incorporated and the system was renamed.<ref name="pmid 15711545">{{cite journal| author=Chesler EJ, Lu L, Shou S, Qu Y, Gu J, Wang J, Hsu HC, Mountz JD, Baldwin NE, Langston MA, Threadgill DW, Manly KF, Williams RW | title=Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. | journal=Nat Genet | year= 2005 | volume= 37 | pages= 233-42 | pmid= 15711545 | url=http://www.ncbi.nlm.nih.gov/pubmed/15711545 }} </ref> <ref name="pmid 15711547">{{cite journal| author=Bystrykh L, Weersing E, Dontje B, Sutton S, Pletcher MT, Wiltshire T, Su AI, Vellenga E, Wang J, Manly KF, Lu L, Chesler EJ, Alberts R, Jansen RC, Williams RW, Cooke MP, de Haan G | title=Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics'. | journal=Nat Genet | year= 2005 | volume= 37 | pages= 225-32 | pmid= 15711547 | url=http://www.ncbi.nlm.nih.gov/pubmed/15711547 }} </ref> GeneNetwork is now developed by an international group of open source developers and has mirror and development sites in Europe, Asia, and Australia.
Development of GeneNetwork started at the University of Tennessee Health Science Center in 1994 as a web-based version of the [http://www.nervenet.org/main/dictionary.html Portable Dictionary of the Mouse Genome (1994)].<ref name="pmid 8043953">{{cite journal |last1=Williams |first1=RW |title=The Portable Dictionary of the Mouse Genome: a personal database for gene mapping and molecular biology. |journal=Mammalian Genome |volume=5 |issue=6 |pages=372–5 |year=1994 |pmid=8043953 |doi=10.1007/bf00356557|s2cid=655396 }}</ref> GeneNetwork is both the first and the longest continuously operating web service in biomedical research [see https://en.wikipedia.org/wiki/List_of_websites_founded_before_1995]. In 1999 the Portable Gene Dictionary was combined with Kenneth F. Manly's [http://mapmanager.org/ Map Manager] QT mapping program to produce an online system for real-time genetic analysis.<ref name="pmid 15114364">{{cite journal |last1=Chesler |first1=EJ |last2=Lu |first2=L |last3=Wang |first3=J |last4=Williams |first4=RW |last5=Manly |first5=KF |title=WebQTL: rapid exploratory analysis of gene expression and genetic networks for brain and behavior |journal=Nature Neuroscience |volume=7 |issue=5 |pages=485–6 |year=2004 |pmid=15114364 |doi=10.1038/nn0504-485|s2cid=20241963 }}</ref> In early 2003, the first large [[Affymetrix]] gene expression data sets (whole [[mouse brain]] mRNA and hematopoietic stem cells) were incorporated and the system was renamed WebQTL.<ref name="pmid 15711545">{{cite journal |last1=Chesler |first1=EJ |last2=Lu |first2=L |last3=Shou |first3=S |last4=Qu |first4=Y |last5=Gu |first5=J |last6=Wang |first6=J |last7=Hsu |first7=HC |last8=Mountz |first8=JD |last9=Baldwin |first9=NE |last10=Langston |first10=Michael A |last11=Threadgill |first11=David W |last12=Manly |first12=Kenneth F |last13=Williams |first13=Robert W |title=Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function |journal=Nature Genetics |volume=37 |issue=3 |pages=233–42 |year=2005 |pmid=15711545 |doi=10.1038/ng1518|s2cid=13189340 |display-authors=8 }}</ref><ref name="pmid 15711547">{{cite journal |last1=Bystrykh |first1=L |last2=Weersing |first2=E |last3=Dontje |first3=B |last4=Sutton |first4=S |last5=Pletcher |first5=MT |last6=Wiltshire |first6=T |last7=Su |first7=AI |last8=Vellenga |first8=E |last9=Wang |first9=J |last10=Manly |first10=Kenneth F |last11=Lu |first11=Lu |last12=Chesler |first12=Elissa J |last13=Alberts |first13=Rudi |last14=Jansen |first14=Ritsert C |last15=Williams |first15=Robert W |last16=Cooke |first16=Michael P |last17=De Haan |first17=Gerald |title=Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics' |journal=Nature Genetics |volume=37 |issue=3 |pages=225–32 |year=2005 |pmid=15711547 |doi=10.1038/ng1497|s2cid=5622506 |display-authors=8 }}</ref> GeneNetwork is now developed by an international group of developers and has mirror and development sites in Europe, Asia, and Australia. Production services are hosted on systems at [[University of Tennessee Health Science Center]] with a backup instance in Europe.


A the current production version of GeneNetwork (also known as GN2) was released in 2016.<ref name="doi=10.21105/joss.00025">{{cite journal |last1=Sloan |first1=Z |title=GeneNetwork: framework for web-based genetics. |journal=The Journal of Open Source Software |volume=1 |issue=2 |year=2016 |doi=10.21105/joss.00025 |page=25|bibcode=2016JOSS....1...25S |doi-access=free }}</ref> The current version of GeneNetwork uses the same database as its predecessor, GN1, but has much more modular and maintainable open source code (available on [https://github.com/genenetwork/genenetwork2 GitHub]). GeneNetwork now also has significant new features including support for:
==Organization and Use==


* Genetically complex populations using linear mixed model implemented with an updated version of [http://www.genenetwork.org GEMMA ],<ref name="pmid 24531419">{{cite journal |last1=Zhou |first1=X |title=Efficient multivariate linear mixed model algorithms for genome-wide association studies. |journal=Nature Methods |volume=11 |issue=2 |pages=407–9 |year=2014 |doi=10.1038/nmeth.2848|pmc=4211878 |pmid=24531419}}</ref>
GeneNetwork consists of two major components:
* [http://www.rqtl.org R/qtl] modules with many mapping options, including mapping of 4-way intercrosses and heterogeneous stock
* [[Weighted correlation network analysis]], also known as WGCNA
* [[Cytoscape]] network display
* [http://joss.theoj.org/papers/440382846e5184c5ad875f8a53cf266d Correlated trait loci mapping]<ref name="do1 dx.doi.org/10.21105/joss.00087">{{cite journal |last1=Arends |first1=D |title=Correlation Trait Loci (CTL) mapping: phenotype network inference subject to genotype. |journal=The Journal of Open Source Software |volume=1 |issue=6 |year=2016 |doi=10.21105/joss.00087 |page=87|bibcode=2016JOSS....1...87A |doi-access=free }}</ref>
* A genome browser to display genetic and genomic data that is based on Biodalliance
* Linked modules to the [http://compbio.uthsc.edu/BNW Bayesian Network Webserver ],<ref name="pmid 23969134">{{cite journal |last1=Ziebarth |first1=JD |title=Bayesian Network Webserver: a comprehensive tool for biological network modeling. |journal=Bioinformatics |volume=29 |issue=2 1|pages=2803–3 |year=2013 |doi=10.1093/bioinformatics/btt472|pmid=23969134|doi-access=free }}</ref> for causal modeling


==Organization and use==
* Massive collections of genetic, genomic, and phenotype data for large families
* Sophisticated statistical analysis and gene mapping software that enable analysis of regulatory networks and genotype-to-phenotype relations


GeneNetwork consists of two major components:
Four levels of data are usually obtained for each family or population:

* Massive collections of genetic, genomic, and phenotype data for large cohorts of individuals
* Sophisticated statistical analysis and gene mapping software that enable analysis of molecular and cellular networks and genotype-to-phenotype relations

Four levels of data are usually obtained for each family or population:


# DNA sequences and [[genotype]]s
# DNA sequences and [[genotype]]s
# [[Gene expression]] values using microarray, [[RNA-seq]], or proteomic methods (molecular phenotypes)
# Molecular expression data often generated using [[Microarray|arrays]], [[RNA-seq]], epigenomic, proteomic, metabolomic, and metagenomic methods (molecular phenotypes)
# Standard [[phenotype]]s of the type that are part of a typical medical record (e.g., blood chemistry, body weight)
# Standard quantitative [[phenotype]]s that are often parts of a typical medical record (e.g., blood chemistry, body weight)
# Annotation files and [[metadata]].
# Annotation files and [[metadata]] for traits and data sets


The combined data types are housed together in a single relational database, but are conceptually organized and divided by species and family. The system is implemented as a [[LAMP (software bundle)]] stack.
The combined data types are housed together in a relational database and IPSF fileserver, and are conceptually organized and grouped by species, cohort, and family. The system is implemented as a [[LAMP (software bundle)]] stack. Code and a simplified version of the [[MariaDB]] database are available on [https://github.com/genenetwork/genenetwork/ GitHub].


GeneNetwork is primarily used by researchers but has also been adopted successfully for undergraduate courses in genetics (see [http://www.youtube.com/watch?v=5UniEc_pzs0 YouTube example]), bioinformatics, physiology, and psychology.<ref name="pmid 20516355">{{cite journal| author=Grisham W, Schottler NA, Valli-Marill J, Beck L, Beatty J | title=Teaching bioinformatics and neuroinformatics by using free web-based tools. | journal=CBE Life Sci Educ | year= 2010 | volume= 9 | pages= 98-107 | pmid= 20516355 | url=http://www.ncbi.nlm.nih.gov/pubmed/20516355 }} </ref> Researchers and students typically retrieve sets of genotypes and phenotypes from one or more families and use built-in statistical and mapping functions to explore relations among variables and to assemble networks of associations. Key steps include the analysis of these factors:
GeneNetwork is primarily used by researchers, but has also been adopted successfully for undergraduate and graduate courses in genetics and bioinformatics (see [https://www.youtube.com/watch?v=5UniEc_pzs0 YouTube example]), bioinformatics, physiology, and psychology.<ref name="pmid 20516355">{{cite journal |last1=Grisham |first1=W |last2=Schottler |first2=NA |last3=Valli-Marill |first3=J |last4=Beck |first4=L |last5=Beatty |first5=J |title=Teaching bioinformatics and neuroinformatics by using free web-based tools |journal=CBE: Life Sciences Education |volume=9 |issue=2 |pages=98–107 |year=2010 |pmid=20516355 |pmc=2879386 |doi=10.1187/cbe.09-11-0079}}</ref> Researchers and students typically retrieve sets of genotypes and phenotypes from one or more families and use built-in statistical and mapping functions to explore relations among variables and to assemble networks of associations. Key steps include the analysis of these factors:


# The range of variation of traits
# The range of variation of traits
# Covariation among traits (scatterplots and correlations)
# Covariation among traits (scatterplots and correlations, principal component analysis)
# Architecture of larger networks of traits
# Architecture of larger networks of traits
# [[Quantitative trait locus]] mapping and causal models of the linkage between sequence differences and phenotype differences
# [[Quantitative trait locus]] mapping and causal models of the linkage between sequence differences and phenotype differences


==Data Sources==
==Data sources==
Traits and molecular expression data sets are submitted by researchers directly or are extracted from repositories such as [[National Center for Biotechnology Information]] Gene Expression Omnibus. Data cover a variety of cells and tissues—from single cell populations of the immune system, specific tissues (retina, prefrontal cortex), to entire systems (whole brain, lung, muscle, heart, fat, kidney, flower, whole plant embryos). A typical data set covers hundreds of fully genotyped individuals and may also include technical and biological replicates. Genotypes and phenotypes are usually taken from peer-reviewed papers. GeneNetwork includes annotation files for several RNA profiling platforms (Affymetrix, Illumina, and Agilent). RNA-seq and quantitative proteomic, metabolomic, epigenetics, and metagenomic data are also available for several species, including mouse and human.

Massive expression data sets are submitted by researchers directly or are extracted from repositories such as [[National Center for Biotechnology Information]] Gene Expression Omnibus. A wide variety of cells and tissues are included--from single cell populations of the immune system, specific tissues (retina, prefrontal cortex), to entire systems (whole brain, lung, muscle, heart, fat, kidney, flower, even whole plant embryos). A typical data set is often based on hundreds of fully genotyped individuals and may also include biological replicates. Genotypes and phenotypes are taken from peer-reviewed papers. GeneNetwork includes annotation files for several RNA profiling platforms (Affymetrix, Illumina, and Agilent). RNA-seq data are also available for BXD recombinant inbred mice. Content and nomenclature are reviewed and edited by [[curators]]. Updates on coverage of species, families, tissues and measurement types are available at this site: [http://www.genenetwork.org/whats_new.html].

Topics of annotation include the following:

* [[DNA sequence]] (SNPs, CNVs, indels)
* [[transcriptome]]s (arrays, RNA-seq)
* [[gene regulatory network]]s
* [[phenome]]

==Tools and Features==


==Tools and features==
There are tools on the site for a wide range of functions that range from simple graphical displays of variation in gene expression or other phenotypes, scatter plots of pairs of traits (Pearson or rank order), construction of both simple and complex network graphs, analysis of principal components and synthetic traits, QTL mapping using marker regression, interval mapping, and pair scans for epistatic interactions. Most functions work with up to 100 traits and several functions work with an entire [[transcriptome]].
There are tools on the site for a wide range of functions that range from simple graphical displays of variation in gene expression or other phenotypes, scatter plots of pairs of traits (Pearson or rank order), construction of both simple and complex network graphs, analysis of principal components and synthetic traits, QTL mapping using marker regression, interval mapping, and pair scans for epistatic interactions. Most functions work with up to 100 traits and several functions work with an entire [[transcriptome]].


The database can be browsed and searched at the main [http://www.genenetwork.org/ search] page. An on-line [http://http://www.genenetwork.org/tutorial/WebQTLTour/ tutorial] is available. Users can also [http://www.genenetwork.org/share/data/ download] the primary data sets as text files, Excel, or in the case of network graphs, as [[SBML]].
The database can be browsed and searched at the main [http://www.genenetwork.org/ search] page. An on-line [http://www.genenetwork.org/tutorial/WebQTLTour/ tutorial] is available. Users can also [http://www.genenetwork.org/share/data/ download] the primary data sets as text files, Excel, or in the case of network graphs, as [[SBML]]. As of 2017, [http://gn2.genenetwork.org GN2] is available as a beta release.


==Code==
==Code==
GeneNetwork is an open source project released under the [[Affero General Public License]] (AGPLv3). The majority of code is written in Python, but includes modules and other code written in C, R, and JavaScript. The code is mainly Python 2.4. GN2 is mainly written in Python 2.7 in a [[Flask (programming)|Flask]] framework with [[Jinja (template engine)|Jinja]]2 HTML templates) but with conversion to Python 3.X planned over the next few years. GN2 calls many statistical procedures written in the [[R (programming language)|R programming language]]. The original source code from 2010 along with a compact database are available on [http://sourceforge.net/projects/genenetwork/files/ SourceForge]. While [https://github.com/genenetwork/genenetwork/ GN1] was actively maintained through 2019 [[GitHub]], as of 2020 all work is focused on [https://github.com/genenetwork/genenetwork2/ GN2].

GeneNetwork is an open source project released under the [[Affero General Public License]] (AGPLv3). The majority of code is written in Python, but includes modules and other code written in C and JavaScript. GeneNetwork also calls statistical procedures written in R.



==See also==
==See also==
* [[Computational genomics]]
* [[Computational genomics]]
* [[Cytoscape]]
* [[KEGG]] (The Kyoto Encyclopedia of Genes and Genomes)
* [[KEGG]] (The Kyoto Encyclopedia of Genes and Genomes)
* [[WikiPathways]]
* [[Reactome]]
* [[Reactome]]
* [[WikiPathways]]


==References==
==References==
Line 83: Line 74:
==External links==
==External links==
* [http://www.genenetwork.org/ GeneNetwork homepage]
* [http://www.genenetwork.org/ GeneNetwork homepage]
;Related resources


=== Related resources ===
Other systems genetics and network databases
Other systems genetics and network databases
* [http://biogps.gnf.org/ BioGPS]
* [https://web.archive.org/web/20091230141946/http://biogps.gnf.org/ BioGPS]
* [http://sagebase.org/ Sage Bionetworks]
* [http://sagebase.org/ Sage Bionetworks]
* [http://amigo.geneontology.org/cgi-bin/amigo/go.cgi/ AmiGo]
* [https://web.archive.org/web/20121112002432/http://amigo.geneontology.org/cgi-bin/amigo/go.cgi/ AmiGo]
* [http://www.wikipathways.org WikiPathways]
* [http://www.wikipathways.org WikiPathways]
* [http://www.cytoscape.org/ Cytoscape]
* [http://www.cytoscape.org/ Cytoscape]
* [http://www.esyn.org/ esyN]
* [https://web.archive.org/web/20110701002425/http://www.genenetwork.nl/wordpress/ GeneNetwork, Netherlands]


[[Category:Genetics databases]]

[[Category:Biological databases]]
[[Category:Systems biology]]
[[Category:Systems biology]]
[[Category:Mathematical and theorteical biology]]
[[Category:Bioinformatics software]]
[[Category:Software using the GNU AGPL license]]

Latest revision as of 20:58, 17 June 2024

GeneNetwork
Developer(s)GeneNetwork Development Team, University of Tennessee
Initial release15 January 1994; 30 years ago (1994-01-15)
Stable release
2.0 / 29 May 2016; 8 years ago (2016-05-29)
Repositorygithub.com/genenetwork/genenetwork2
Written inJavaScript, HTML, Python, CSS, CoffeeScript, PHP
LicenseAffero General Public License
Websitewww.genenetwork.org

GeneNetwork is a combined database and open-source bioinformatics data analysis software resource for systems genetics.[1] This resource is used to study gene regulatory networks that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes (e.g., SNPs) and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as Drosophila melanogaster, Arabidopsis thaliana, and barley.[2] The inclusion of genotypes makes it practical to carry out web-based gene mapping to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.

History

[edit]

Development of GeneNetwork started at the University of Tennessee Health Science Center in 1994 as a web-based version of the Portable Dictionary of the Mouse Genome (1994).[3] GeneNetwork is both the first and the longest continuously operating web service in biomedical research [see https://en.wikipedia.org/wiki/List_of_websites_founded_before_1995]. In 1999 the Portable Gene Dictionary was combined with Kenneth F. Manly's Map Manager QT mapping program to produce an online system for real-time genetic analysis.[4] In early 2003, the first large Affymetrix gene expression data sets (whole mouse brain mRNA and hematopoietic stem cells) were incorporated and the system was renamed WebQTL.[5][6] GeneNetwork is now developed by an international group of developers and has mirror and development sites in Europe, Asia, and Australia. Production services are hosted on systems at University of Tennessee Health Science Center with a backup instance in Europe.

A the current production version of GeneNetwork (also known as GN2) was released in 2016.[7] The current version of GeneNetwork uses the same database as its predecessor, GN1, but has much more modular and maintainable open source code (available on GitHub). GeneNetwork now also has significant new features including support for:

Organization and use

[edit]

GeneNetwork consists of two major components:

  • Massive collections of genetic, genomic, and phenotype data for large cohorts of individuals
  • Sophisticated statistical analysis and gene mapping software that enable analysis of molecular and cellular networks and genotype-to-phenotype relations

Four levels of data are usually obtained for each family or population:

  1. DNA sequences and genotypes
  2. Molecular expression data often generated using arrays, RNA-seq, epigenomic, proteomic, metabolomic, and metagenomic methods (molecular phenotypes)
  3. Standard quantitative phenotypes that are often parts of a typical medical record (e.g., blood chemistry, body weight)
  4. Annotation files and metadata for traits and data sets

The combined data types are housed together in a relational database and IPSF fileserver, and are conceptually organized and grouped by species, cohort, and family. The system is implemented as a LAMP (software bundle) stack. Code and a simplified version of the MariaDB database are available on GitHub.

GeneNetwork is primarily used by researchers, but has also been adopted successfully for undergraduate and graduate courses in genetics and bioinformatics (see YouTube example), bioinformatics, physiology, and psychology.[11] Researchers and students typically retrieve sets of genotypes and phenotypes from one or more families and use built-in statistical and mapping functions to explore relations among variables and to assemble networks of associations. Key steps include the analysis of these factors:

  1. The range of variation of traits
  2. Covariation among traits (scatterplots and correlations, principal component analysis)
  3. Architecture of larger networks of traits
  4. Quantitative trait locus mapping and causal models of the linkage between sequence differences and phenotype differences

Data sources

[edit]

Traits and molecular expression data sets are submitted by researchers directly or are extracted from repositories such as National Center for Biotechnology Information Gene Expression Omnibus. Data cover a variety of cells and tissues—from single cell populations of the immune system, specific tissues (retina, prefrontal cortex), to entire systems (whole brain, lung, muscle, heart, fat, kidney, flower, whole plant embryos). A typical data set covers hundreds of fully genotyped individuals and may also include technical and biological replicates. Genotypes and phenotypes are usually taken from peer-reviewed papers. GeneNetwork includes annotation files for several RNA profiling platforms (Affymetrix, Illumina, and Agilent). RNA-seq and quantitative proteomic, metabolomic, epigenetics, and metagenomic data are also available for several species, including mouse and human.

Tools and features

[edit]

There are tools on the site for a wide range of functions that range from simple graphical displays of variation in gene expression or other phenotypes, scatter plots of pairs of traits (Pearson or rank order), construction of both simple and complex network graphs, analysis of principal components and synthetic traits, QTL mapping using marker regression, interval mapping, and pair scans for epistatic interactions. Most functions work with up to 100 traits and several functions work with an entire transcriptome.

The database can be browsed and searched at the main search page. An on-line tutorial is available. Users can also download the primary data sets as text files, Excel, or in the case of network graphs, as SBML. As of 2017, GN2 is available as a beta release.

Code

[edit]

GeneNetwork is an open source project released under the Affero General Public License (AGPLv3). The majority of code is written in Python, but includes modules and other code written in C, R, and JavaScript. The code is mainly Python 2.4. GN2 is mainly written in Python 2.7 in a Flask framework with Jinja2 HTML templates) but with conversion to Python 3.X planned over the next few years. GN2 calls many statistical procedures written in the R programming language. The original source code from 2010 along with a compact database are available on SourceForge. While GN1 was actively maintained through 2019 GitHub, as of 2020 all work is focused on GN2.

See also

[edit]

References

[edit]
  1. ^ Morahan, G; Williams, RW (2007). "Systems Genetics: The Next Generation in Genetics Research?". Decoding the Genomic Control of Immune Reactions. Novartis Foundation Symposia. Vol. 281. pp. 181–8, discussion 188–91, 208–9. doi:10.1002/9780470062128.ch15. ISBN 9780470062128. PMID 17534074. {{cite book}}: |journal= ignored (help)
  2. ^ Druka, A; Druka, I; Centeno, AG; Li, H; Sun, Z; Thomas, WT; Bonar, N; Steffenson, BJ; Ullrich, SE; Kleinhofs, Andris; Wise, Roger P; Close, Timothy J; Potokina, Elena; Luo, Zewei; Wagner, Carola; Schweizer, Günther F; Marshall, David F; Kearsey, Michael J; Williams, Robert W; Waugh, Robbie (2008). "Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork". BMC Genetics. 9: 73. doi:10.1186/1471-2156-9-73. PMC 2630324. PMID 19017390.
  3. ^ Williams, RW (1994). "The Portable Dictionary of the Mouse Genome: a personal database for gene mapping and molecular biology". Mammalian Genome. 5 (6): 372–5. doi:10.1007/bf00356557. PMID 8043953. S2CID 655396.
  4. ^ Chesler, EJ; Lu, L; Wang, J; Williams, RW; Manly, KF (2004). "WebQTL: rapid exploratory analysis of gene expression and genetic networks for brain and behavior". Nature Neuroscience. 7 (5): 485–6. doi:10.1038/nn0504-485. PMID 15114364. S2CID 20241963.
  5. ^ Chesler, EJ; Lu, L; Shou, S; Qu, Y; Gu, J; Wang, J; Hsu, HC; Mountz, JD; et al. (2005). "Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function". Nature Genetics. 37 (3): 233–42. doi:10.1038/ng1518. PMID 15711545. S2CID 13189340.
  6. ^ Bystrykh, L; Weersing, E; Dontje, B; Sutton, S; Pletcher, MT; Wiltshire, T; Su, AI; Vellenga, E; et al. (2005). "Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics'". Nature Genetics. 37 (3): 225–32. doi:10.1038/ng1497. PMID 15711547. S2CID 5622506.
  7. ^ Sloan, Z (2016). "GeneNetwork: framework for web-based genetics". The Journal of Open Source Software. 1 (2): 25. Bibcode:2016JOSS....1...25S. doi:10.21105/joss.00025.
  8. ^ Zhou, X (2014). "Efficient multivariate linear mixed model algorithms for genome-wide association studies". Nature Methods. 11 (2): 407–9. doi:10.1038/nmeth.2848. PMC 4211878. PMID 24531419.
  9. ^ Arends, D (2016). "Correlation Trait Loci (CTL) mapping: phenotype network inference subject to genotype". The Journal of Open Source Software. 1 (6): 87. Bibcode:2016JOSS....1...87A. doi:10.21105/joss.00087.
  10. ^ Ziebarth, JD (2013). "Bayesian Network Webserver: a comprehensive tool for biological network modeling". Bioinformatics. 29 (2 1): 2803–3. doi:10.1093/bioinformatics/btt472. PMID 23969134.
  11. ^ Grisham, W; Schottler, NA; Valli-Marill, J; Beck, L; Beatty, J (2010). "Teaching bioinformatics and neuroinformatics by using free web-based tools". CBE: Life Sciences Education. 9 (2): 98–107. doi:10.1187/cbe.09-11-0079. PMC 2879386. PMID 20516355.
[edit]
Related resources

Other systems genetics and network databases