Volcano plot (statistics): Difference between revisions

Content deleted Content added

Inline

Revision as of 01:28, 23 June 2016

In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data.^[1] It plots significance versus fold-change on the y and x axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a p-value from an ANOVA model) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also statistically significant.

A volcano plot is constructed by plotting the negative log of the p-value on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing toward the top of the plot. The x axis is the log of the fold change between the two conditions. The log of the fold-change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance (hence being toward the top).

Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a significance analysis of microarrays (SAM) gene selection criterion, an example of regularization.^[2]

The concept of volcano plot can be generalized to other applications, where the x axis is related to a measure of the strength of a statistical signal, and y-axis is related to a measure of the statistical significance of the signal. For example, in a genetic association case-control study, such as Genome-wide association study, a point in a volcano plot represents a single-nucleotide polymorphism. Its x value can be the odds ratio and its y value can be -log10 of the p-value from Chi-square test or a Chi-square test statistic.^[3]

References

^ Cui, X.; Churchill, G. A. (2003). "Statistical tests for differential expression in cDNA microarray experiments". Genome Biology. 4 (4): 210. doi:10.1186/gb-2003-4-4-210. PMC 154570. PMID 12702200.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Li, W. (2012). "Volcano plots in analyzing differential expressions with mRNA microarrays". Journal of Bioinformatics and Computational Biology. 10 (6): 1231003. doi:10.1142/S0219720012310038. PMID 23075208.
^ Li, W.; Freudenberg, J.; Suh, Y. J.; Yang, Y. (2014). "Using volcano plots and regularized-chi statistics in genetic association studies". Computational Biology and Chemistry. 48: 77–83. doi:10.1016/j.compbiolchem.2013.02.003. PMID 23602812.

External links

[pmid12702200-1] Cui, X.; Churchill, G. A. (2003). "Statistical tests for differential expression in cDNA microarray experiments". Genome Biology. 4 (4): 210. doi:10.1186/gb-2003-4-4-210. PMC 154570. PMID 12702200.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[pmid23075208-2] Li, W. (2012). "Volcano plots in analyzing differential expressions with mRNA microarrays". Journal of Bioinformatics and Computational Biology. 10 (6): 1231003. doi:10.1142/S0219720012310038. PMID 23075208.

[pmid23602812-3] Li, W.; Freudenberg, J.; Suh, Y. J.; Yang, Y. (2014). "Using volcano plots and regularized-chi statistics in genetic association studies". Computational Biology and Chemistry. 48: 77–83. doi:10.1016/j.compbiolchem.2013.02.003. PMID 23602812.

[1]

[2]

[3]

@@ Line 1: / Line 1: @@
-[[Image:volcano eg.jpg|thumb|350px|Volcano plot showing [[metabolomic]] data.  The red arrows indicate points-of-interest that display both large-magnitude [[Fold change|fold-changes]] (x-axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.]]
+[[Image:volcano eg.jpg|thumb|350px|Volcano plot showing [[metabolomic]] data.  The red arrows indicate points-of-interest that display both large-magnitude [[Fold change|fold-changes]] (x axis) as well as high statistical significance (-log10 of p-value, y-axis). The dashed red-line shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. This plot is colored such that those points having a fold-change less than 2 (log2 = 1) are shown in gray.]]
-In statistics, a '''volcano plot''' is a type of [[scatter-plot]] that is used to quickly identify changes in large datasets composed of replicate data.<ref name="pmid12702200">{{Cite journal | doi = 10.1186/gb-2003-4-4-210| pmid = 12702200| year = 2003| last1 = Cui | first1 = X. | title = Statistical tests for differential expression in cDNA microarray experiments| journal = Genome Biology| volume = 4| issue = 4| pages = 210| last2 = Churchill | first2 = G. A. | pmc = 154570}}</ref>  It plots significance versus [[Fold change|fold-change]] on the y- and x-axes, respectively.  These plots are increasingly common in [[List of omics topics in biology|omic]] experiments such as [[genomics]], [[proteomics]], and [[metabolomics]] where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes.  A volcano plot combines a measure of statistical significance from a statistical test (e.g., a [[p-value]] from an [[ANOVA]] model) with the magnitude of the change  enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also [[Statistical significance|statistically significant]].
+In statistics, a '''volcano plot''' is a type of [[scatter-plot]] that is used to quickly identify changes in large datasets composed of replicate data.<ref name="pmid12702200">{{Cite journal | doi = 10.1186/gb-2003-4-4-210| pmid = 12702200| year = 2003| last1 = Cui | first1 = X. | title = Statistical tests for differential expression in cDNA microarray experiments| journal = Genome Biology| volume = 4| issue = 4| pages = 210| last2 = Churchill | first2 = G. A. | pmc = 154570}}</ref>  It plots significance versus [[Fold change|fold-change]] on the y and x axes, respectively.  These plots are increasingly common in [[List of omics topics in biology|omic]] experiments such as [[genomics]], [[proteomics]], and [[metabolomics]] where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes.  A volcano plot combines a measure of statistical significance from a statistical test (e.g., a [[p-value]] from an [[ANOVA]] model) with the magnitude of the change  enabling quick visual identification of those data-points (genes, etc.) that display large-magnitude changes that are also [[Statistical significance|statistically significant]].
-A volcano plot is constructed by plotting the negative log of the [[p-value]] on the [[y-axis]] (usually base 10).  This results in datapoints with low p-values (highly significant) appearing toward the top of the plot.  The [[x-axis]] is the log of the [[fold change]] between the two conditions.  The log of the fold-change is used so that changes in both directions appear equidistant from the center.  Plotting points in this way results in two regions of interest in the plot:  those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high [[statistical significance]] (hence being toward the top).
+A volcano plot is constructed by plotting the negative log of the [[p-value]] on the [[y-axis]] (usually base 10).  This results in datapoints with low p-values (highly significant) appearing toward the top of the plot.  The [[x axis]] is the log of the [[fold change]] between the two conditions.  The log of the fold-change is used so that changes in both directions appear equidistant from the center.  Plotting points in this way results in two regions of interest in the plot:  those points that are found toward the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high [[statistical significance]] (hence being toward the top).
 Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed. Volcano plot is also used to graphically display a [[significance analysis of microarrays]] (SAM) gene selection criterion, an example of [[Regularization (mathematics)|regularization]].<ref name="pmid23075208">{{Cite journal | doi = 10.1142/S0219720012310038| pmid = 23075208| title = Volcano plots in analyzing differential expressions with mRNA microarrays| journal = [[Journal of Bioinformatics and Computational Biology]]| volume = 10| issue = 6| pages = 1231003| year = 2012| last1 = Li | first1 = W. | authorlink1 = Wentian Li}}</ref>
-The concept of volcano plot can be generalized to other applications, where the [[x-axis]] is related to a measure of
+The concept of volcano plot can be generalized to other applications, where the [[x axis]] is related to a measure of
 the strength of a statistical signal, and [[y-axis]] is related to a measure of the [[statistical significance]] of the signal.
 For example, in a [[genetic association]] [[case-control]] study, such as [[Genome-wide association study]],