Scatter plot: Difference between revisions
m Reverted edits by 216.11.243.58 (talk) to last revision by Solarra (HG) |
No edit summary |
||
Line 1: | Line 1: | ||
{{Infobox quality tool |
{{Infobox quality tool |
||
| |
|||
| image = Scatter diagram for quality characteristic XXX.svg |
|||
| category = One of the '''[[Seven Basic Tools of Quality]]''' |
|||
| describer = [[Francis Galton]] |
|||
| purpose = To identify the type of relationship (if any) between two variables |
|||
}} |
|||
[[Image:oldfaithful3.png|thumb|240px|Waiting time between eruptions and the duration of the eruption for the [[Old Faithful Geyser]] in [[Yellowstone National Park]], [[Wyoming]], USA. This chart suggests there are generally two "types" of eruptions: short-wait-short-duration, and long-wait-long-duration.]] |
|||
[[Image:Scatter plot.jpg|thumb|240px|A 3D scatter plot allows the visualization of multivariate data. This scatter plot takes multiple scalar variables and uses them for different axes in phase space. The different variables are combined to form coordinates in the phase space and they are displayed using glyphs and colored using another scalar variable.<ref>[https://wci.llnl.gov/codes/visit/gallery.html Visualizations that have been created with VisIt] at wci.llnl.gov. Last updated: November 8, 2007.</ref>]] |
|||
A '''scatter plot''' or '''scattergraph''' is a type of [[mathematical diagram]] using [[Cartesian coordinate system|Cartesian coordinates]] to display values for two [[Variable (mathematics)|variable]]s for a set of data. |
|||
The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.<ref>Utts, Jessica M. ''Seeing Through Statistics'' 3rd Edition, Thomson Brooks/Cole, 2005, pp 166-167. ISBN 0-534-39402-7</ref> This kind of [[Plot (graphics)|plot]] is also called a ''scatter chart'', ''scattergram'', ''scatter diagram'',<ref>{{cite book |last=Jarrell |first=Stephen B. |title=Basic Statistics |year=1994 |publisher=Wm. C. Brown Pub. |location=Dubuque, Iowa |isbn=0-697-21595-4 |edition=Special pre-publication |page=492 |quote=When we search for a relationship between two variables, a standard graph of the available data pairs (X,Y), called a ''scatter diagram'', frequently helps...}}</ref> or ''scatter graph''. |
|||
== Overview == |
== Overview == |
||
A scatter plot is used when a variable exists that is below the control of the experimenter. If a parameter exists that is systematically incremented and/or decremented by the other, it is called the ''control parameter'' or [[independent variable]] and is customarily plotted along the horizontal axis. The measured or [[dependent variable]] is customarily plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of [[correlation]] (not [[causality|causation]]) between two variables. |
A scatter plot is used when a variable exists that is below the control of the experimenter. If a parameter exists that is systematically incremented and/or decremented by the other, it is called the ''control parameter'' or [[independent variable]] and is customarily plotted along the horizontal axis. The measured or [[dependent variable]] is customarily plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of [[correlation]] (not [[causality|causation]]) between two variables. |
Revision as of 18:22, 25 June 2013
{{Infobox quality tool |
Overview
A scatter plot is used when a variable exists that is below the control of the experimenter. If a parameter exists that is systematically incremented and/or decremented by the other, it is called the control parameter or independent variable and is customarily plotted along the horizontal axis. The measured or dependent variable is customarily plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of correlation (not causation) between two variables.
A scatter plot can suggest various kinds of correlations between variables with a certain confidence interval. For example, weight and height, weight would be on x axis and height would be on the y axis. Correlations may be positive (rising), negative (falling), or null (uncorrelated). If the pattern of dots slopes from lower left to upper right, it suggests a positive correlation between the variables being studied. If the pattern of dots slopes from upper left to lower right, it suggests a negative correlation. A line of best fit (alternatively called 'trendline') can be drawn in order to study the correlation between the variables. An equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the best-fit procedure is known as linear regression and is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a correct solution for arbitrary relationships. A scatter plot is also very useful when we wish to see how two comparable data sets agree with each other. In this case, an identity line, i.e., a y=x line, or an 1:1 line, is often drawn as a reference. The more the two data sets agree, the more the scatters tend to concentrate in the vicinity of the identity line; if the two data sets are numerically identical, the scatters fall on the identity line exactly.
One of the most powerful aspects of a scatter plot, however, is its ability to show nonlinear relationships between variables. Furthermore, if the data is represented by a mixture model of simple relationships, these relationships will be visually evident as superimposed patterns.
The scatter diagram is one of the seven basic tools of quality control.[1]
Example
For example, to display values for "lung capacity" (first variable) and how long that person could hold his breath, a researcher would choose a group of people to study, then measure each one's lung capacity (first variable) and how long that person could hold his breath (second variable). The researcher would then plot the data in a scatter plot, assigning "lung capacity" to the horizontal axis, and "time holding breath" to the vertical axis.
A person with a lung capacity of 400 ml who held his breath for 21.7 seconds would be represented by a single dot on the scatter plot at the point (400, 21.7) in the Cartesian coordinates. The scatter plot of all the people in the study would enable the researcher to obtain a visual comparison of the two variables in the data set, and will help to determine what kind of relationship there might be between the two variables.
See also
References
- ^ Nancy R. Tague (2004). "Seven Basic Quality Tools". The Quality Toolbox. Milwaukee, Wisconsin: American Society for Quality. p. 15. Retrieved 2010-02-05.
External links
- What is a scatterplot?
- Correlation scatter-plot matrix - for ordered-categorical data - Explanation and R code
- Tool for visualizing scatter plots
- Density scatterplot for large datasets (hundreds of millions of points)