Jump to content

Volcano plot (statistics)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Roadnottaken (talk | contribs) at 23:03, 3 January 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Volcano plot showing metabolomic data. The red arrows indicate points-of-interest that display both large-magnitude fold-changes (x-axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.

In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data [1]. It plots significance versus fold-change on the y- and x-axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a statistical test (e.g., p-value, ANOVA) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc) that display large-magnitude changes that are also statistically significant.

A volcano plot is constructed by plotting the negative log of the p-value on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing towards the top of the plot. The x-axis is the log of the fold-change between the two conditions. Plotting points in this way results in two regions of interest in the plot: those points that are found towards the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance (hence being towards the top).

Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed.


References

  1. ^ Cui X, Churchill GA (2003). "Statistical tests for differential expression in cDNA microarray experiments". Genome Biol. 4 (4): 210. PMC 154570. PMID 12702200.