Volcano plot (statistics): Difference between revisions
Copy-editing to remove redundancy |
Punctuation corrections |
||
Line 1: | Line 1: | ||
[[Image:volcano eg.jpg|thumb|350px|Volcano plot showing [[metabolomic]] data. The red arrows indicate points-of-interest that display both large-magnitude [[Fold change|fold-changes]] (x |
[[Image:volcano eg.jpg|thumb|350px|Volcano plot showing [[metabolomic]] data. The red arrows indicate points-of-interest that display both large-magnitude [[Fold change|fold-changes]] (x axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.]] |
||
In statistics, a '''volcano plot''' is a type of [[scatter-plot]] that is used to quickly identify changes in large datasets composed of replicate data.<ref name="pmid12702200">{{Cite journal | doi = 10.1186/gb-2003-4-4-210| pmid = 12702200| year = 2003| last1 = Cui | first1 = X. | title = Statistical tests for differential expression in cDNA microarray experiments| journal = Genome Biology| volume = 4| issue = 4| pages = 210| last2 = Churchill | first2 = G. A. | pmc = 154570}}</ref> It plots significance versus [[Fold change|fold-change]] on the y |
In statistics, a '''volcano plot''' is a type of [[scatter-plot]] that is used to quickly identify changes in large datasets composed of replicate data.<ref name="pmid12702200">{{Cite journal | doi = 10.1186/gb-2003-4-4-210| pmid = 12702200| year = 2003| last1 = Cui | first1 = X. | title = Statistical tests for differential expression in cDNA microarray experiments| journal = Genome Biology| volume = 4| issue = 4| pages = 210| last2 = Churchill | first2 = G. A. | pmc = 154570}}</ref> It plots significance versus [[Fold change|fold-change]] on the y and x axes, respectively. These plots are increasingly common in [[List of omics topics in biology|omic]] experiments such as [[genomics]], [[proteomics]], and [[metabolomics]] where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a [[p-value]] from an [[ANOVA]] model) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also [[Statistical significance|statistically significant]]. |
||
A volcano plot is constructed by plotting the negative log of the [[p-value]] on the [[y-axis]] (usually base 10). This results in datapoints with low p-values (highly significant) appearing toward the top of the plot. The [[x |
A volcano plot is constructed by plotting the negative log of the [[p-value]] on the [[y-axis]] (usually base 10). This results in datapoints with low p-values (highly significant) appearing toward the top of the plot. The [[x axis]] is the log of the [[fold change]] between the two conditions. The log of the fold-change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high [[statistical significance]] (hence being toward the top). |
||
Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a [[significance analysis of microarrays]] (SAM) gene selection criterion, an example of [[Regularization (mathematics)|regularization]].<ref name="pmid23075208">{{Cite journal | doi = 10.1142/S0219720012310038| pmid = 23075208| title = Volcano plots in analyzing differential expressions with mRNA microarrays| journal = [[Journal of Bioinformatics and Computational Biology]]| volume = 10| issue = 6| pages = 1231003| year = 2012| last1 = Li | first1 = W. | authorlink1 = Wentian Li}}</ref> |
Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a [[significance analysis of microarrays]] (SAM) gene selection criterion, an example of [[Regularization (mathematics)|regularization]].<ref name="pmid23075208">{{Cite journal | doi = 10.1142/S0219720012310038| pmid = 23075208| title = Volcano plots in analyzing differential expressions with mRNA microarrays| journal = [[Journal of Bioinformatics and Computational Biology]]| volume = 10| issue = 6| pages = 1231003| year = 2012| last1 = Li | first1 = W. | authorlink1 = Wentian Li}}</ref> |
||
The concept of volcano plot can be generalized to other applications, where the [[x |
The concept of volcano plot can be generalized to other applications, where the [[x axis]] is related to a measure of |
||
the strength of a statistical signal, and [[y-axis]] is related to a measure of the [[statistical significance]] of the signal. |
the strength of a statistical signal, and [[y-axis]] is related to a measure of the [[statistical significance]] of the signal. |
||
For example, in a [[genetic association]] [[case-control]] study, such as [[Genome-wide association study]], |
For example, in a [[genetic association]] [[case-control]] study, such as [[Genome-wide association study]], |
Revision as of 01:28, 23 June 2016
In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data.[1] It plots significance versus fold-change on the y and x axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a p-value from an ANOVA model) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also statistically significant.
A volcano plot is constructed by plotting the negative log of the p-value on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing toward the top of the plot. The x axis is the log of the fold change between the two conditions. The log of the fold-change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance (hence being toward the top).
Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a significance analysis of microarrays (SAM) gene selection criterion, an example of regularization.[2]
The concept of volcano plot can be generalized to other applications, where the x axis is related to a measure of the strength of a statistical signal, and y-axis is related to a measure of the statistical significance of the signal. For example, in a genetic association case-control study, such as Genome-wide association study, a point in a volcano plot represents a single-nucleotide polymorphism. Its x value can be the odds ratio and its y value can be -log10 of the p-value from Chi-square test or a Chi-square test statistic.[3]
References
- ^ Cui, X.; Churchill, G. A. (2003). "Statistical tests for differential expression in cDNA microarray experiments". Genome Biology. 4 (4): 210. doi:10.1186/gb-2003-4-4-210. PMC 154570. PMID 12702200.
{{cite journal}}
: CS1 maint: unflagged free DOI (link) - ^ Li, W. (2012). "Volcano plots in analyzing differential expressions with mRNA microarrays". Journal of Bioinformatics and Computational Biology. 10 (6): 1231003. doi:10.1142/S0219720012310038. PMID 23075208.
- ^ Li, W.; Freudenberg, J.; Suh, Y. J.; Yang, Y. (2014). "Using volcano plots and regularized-chi statistics in genetic association studies". Computational Biology and Chemistry. 48: 77–83. doi:10.1016/j.compbiolchem.2013.02.003. PMID 23602812.