Jump to content

Volcano plot (statistics): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Copy-editing to remove redundancy
Punctuation corrections
Line 1: Line 1:
[[Image:volcano eg.jpg|thumb|350px|Volcano plot showing [[metabolomic]] data. The red arrows indicate points-of-interest that display both large-magnitude [[Fold change|fold-changes]] (x-axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.]]
[[Image:volcano eg.jpg|thumb|350px|Volcano plot showing [[metabolomic]] data. The red arrows indicate points-of-interest that display both large-magnitude [[Fold change|fold-changes]] (x axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.]]


In statistics, a '''volcano plot''' is a type of [[scatter-plot]] that is used to quickly identify changes in large datasets composed of replicate data.<ref name="pmid12702200">{{Cite journal | doi = 10.1186/gb-2003-4-4-210| pmid = 12702200| year = 2003| last1 = Cui | first1 = X. | title = Statistical tests for differential expression in cDNA microarray experiments| journal = Genome Biology| volume = 4| issue = 4| pages = 210| last2 = Churchill | first2 = G. A. | pmc = 154570}}</ref> It plots significance versus [[Fold change|fold-change]] on the y- and x-axes, respectively. These plots are increasingly common in [[List of omics topics in biology|omic]] experiments such as [[genomics]], [[proteomics]], and [[metabolomics]] where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a [[p-value]] from an [[ANOVA]] model) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also [[Statistical significance|statistically significant]].
In statistics, a '''volcano plot''' is a type of [[scatter-plot]] that is used to quickly identify changes in large datasets composed of replicate data.<ref name="pmid12702200">{{Cite journal | doi = 10.1186/gb-2003-4-4-210| pmid = 12702200| year = 2003| last1 = Cui | first1 = X. | title = Statistical tests for differential expression in cDNA microarray experiments| journal = Genome Biology| volume = 4| issue = 4| pages = 210| last2 = Churchill | first2 = G. A. | pmc = 154570}}</ref> It plots significance versus [[Fold change|fold-change]] on the y and x axes, respectively. These plots are increasingly common in [[List of omics topics in biology|omic]] experiments such as [[genomics]], [[proteomics]], and [[metabolomics]] where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a [[p-value]] from an [[ANOVA]] model) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also [[Statistical significance|statistically significant]].


A volcano plot is constructed by plotting the negative log of the [[p-value]] on the [[y-axis]] (usually base 10). This results in datapoints with low p-values (highly significant) appearing toward the top of the plot. The [[x-axis]] is the log of the [[fold change]] between the two conditions. The log of the fold-change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high [[statistical significance]] (hence being toward the top).
A volcano plot is constructed by plotting the negative log of the [[p-value]] on the [[y-axis]] (usually base 10). This results in datapoints with low p-values (highly significant) appearing toward the top of the plot. The [[x axis]] is the log of the [[fold change]] between the two conditions. The log of the fold-change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high [[statistical significance]] (hence being toward the top).


Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a [[significance analysis of microarrays]] (SAM) gene selection criterion, an example of [[Regularization (mathematics)|regularization]].<ref name="pmid23075208">{{Cite journal | doi = 10.1142/S0219720012310038| pmid = 23075208| title = Volcano plots in analyzing differential expressions with mRNA microarrays| journal = [[Journal of Bioinformatics and Computational Biology]]| volume = 10| issue = 6| pages = 1231003| year = 2012| last1 = Li | first1 = W. | authorlink1 = Wentian Li}}</ref>
Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a [[significance analysis of microarrays]] (SAM) gene selection criterion, an example of [[Regularization (mathematics)|regularization]].<ref name="pmid23075208">{{Cite journal | doi = 10.1142/S0219720012310038| pmid = 23075208| title = Volcano plots in analyzing differential expressions with mRNA microarrays| journal = [[Journal of Bioinformatics and Computational Biology]]| volume = 10| issue = 6| pages = 1231003| year = 2012| last1 = Li | first1 = W. | authorlink1 = Wentian Li}}</ref>


The concept of volcano plot can be generalized to other applications, where the [[x-axis]] is related to a measure of
The concept of volcano plot can be generalized to other applications, where the [[x axis]] is related to a measure of
the strength of a statistical signal, and [[y-axis]] is related to a measure of the [[statistical significance]] of the signal.
the strength of a statistical signal, and [[y-axis]] is related to a measure of the [[statistical significance]] of the signal.
For example, in a [[genetic association]] [[case-control]] study, such as [[Genome-wide association study]],
For example, in a [[genetic association]] [[case-control]] study, such as [[Genome-wide association study]],

Revision as of 01:28, 23 June 2016

Volcano plot showing metabolomic data. The red arrows indicate points-of-interest that display both large-magnitude fold-changes (x axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.

In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data.[1] It plots significance versus fold-change on the y and x axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a p-value from an ANOVA model) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also statistically significant.

A volcano plot is constructed by plotting the negative log of the p-value on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing toward the top of the plot. The x axis is the log of the fold change between the two conditions. The log of the fold-change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance (hence being toward the top).

Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a significance analysis of microarrays (SAM) gene selection criterion, an example of regularization.[2]

The concept of volcano plot can be generalized to other applications, where the x axis is related to a measure of the strength of a statistical signal, and y-axis is related to a measure of the statistical significance of the signal. For example, in a genetic association case-control study, such as Genome-wide association study, a point in a volcano plot represents a single-nucleotide polymorphism. Its x value can be the odds ratio and its y value can be -log10 of the p-value from Chi-square test or a Chi-square test statistic.[3]

References

  1. ^ Cui, X.; Churchill, G. A. (2003). "Statistical tests for differential expression in cDNA microarray experiments". Genome Biology. 4 (4): 210. doi:10.1186/gb-2003-4-4-210. PMC 154570. PMID 12702200.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  2. ^ Li, W. (2012). "Volcano plots in analyzing differential expressions with mRNA microarrays". Journal of Bioinformatics and Computational Biology. 10 (6): 1231003. doi:10.1142/S0219720012310038. PMID 23075208.
  3. ^ Li, W.; Freudenberg, J.; Suh, Y. J.; Yang, Y. (2014). "Using volcano plots and regularized-chi statistics in genetic association studies". Computational Biology and Chemistry. 48: 77–83. doi:10.1016/j.compbiolchem.2013.02.003. PMID 23602812.