Descriptive statistics: Difference between revisions
Undid revision 496682035 by 112.198.217.21 (talk) |
No edit summary Tag: Removal of interwiki link; Wikidata is live |
||
Line 49: | Line 49: | ||
{{Statistics|descriptive}} |
{{Statistics|descriptive}} |
||
{{DEFAULTSORT:Descriptive Statistics}} |
|||
[[Category:Summary statistics]] |
|||
[[Category:Psychometrics]] |
|||
[[ar:إحصاء وصفي]] |
|||
[[ca:Estadística descriptiva]] |
|||
[[de:Deskriptive Statistik]] |
|||
[[es:Estadística descriptiva]] |
|||
[[eu:Estatistika deskribatzaile]] |
|||
[[fa:آمار توصیفی]] |
|||
[[fr:Statistique descriptive]] |
|||
[[ko:기술 통계학]] |
|||
[[id:Statistika deskriptif]] |
|||
[[it:Statistica descrittiva]] |
|||
[[he:סטטיסטיקה תאורית]] |
|||
[[jv:Statistika dhèskriptif]] |
|||
[[lv:Aprakstošā statistika]] |
|||
[[lb:Deskriptiv Statistik]] |
|||
[[ja:要約統計量]] |
|||
[[no:Deskriptiv statistikk]] |
|||
[[pl:Statystyka opisowa]] |
|||
[[pt:Estatística descritiva]] |
|||
[[ru:Описательная статистика]] |
|||
[[simple:Descriptive statistics]] |
|||
[[sr:Дескриптивна студија]] |
|||
[[su:Statistik deskriptif]] |
|||
[[th:สถิติพรรณนา]] |
|||
[[tr:Betimsel istatistik]] |
|||
[[vi:Thống kê mô tả]] |
|||
[[yi:באשרייבנדיקע סטאטיסטיק]] |
|||
[[zh:描述统计学]] |
Revision as of 16:12, 20 June 2012
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data.[1] Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are thought to represent. This generally means that descriptive statistics, unlike inferential statistics, are not developed on the basis of probability theory.[2] Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example in a paper reporting on a study involving human subjects, there typically appears a table giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, and the proportion of subjects with related comorbidities.
Use in statistical analysis
Descriptive statistics provides simple summaries about the sample and about the observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs. These summaries may either form the basis of the initial description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.
For example, the shooting percentage in basketball is a descriptive statistic that summarizes the performance of a player or a team. This number is the number of shots made divided by the number of shots taken. For example, a player who shoots 33% is making approximately one shot in every three. The percentage summarizes or describes multiple discrete events. Consider also the grade point average. This single number describes the general performance of a student across the range of their course experiences. [3]
The use of descriptive and summary statistics has an extensive history and, indeed, the simple tabulation of populations and of economic data was the first way in which the topic of statistics appeared. More recently, a collection of summarisation techniques has been formulated under the heading of exploratory data analysis: an example of such a technique is the box plot.
Univariate analysis
Univariate analysis involves the examination across cases of a single variable.
Distribution
The distribution is a summary of the frequency of individual or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of cases who had that value. For instance, computing the distribution of gender in the study population means computing the percentages that are male and female. The gender variable has only two, making it possible and meaningful to list each one. However, this does not work for a variable such as income that has many possible values. Typically, specific values are not particularly meaningful (income of 50,000 is typically not meaningfully different from 51,000). Grouping the raw scores using ranges of values reduces the number of categories to something more meaningful. For instance, values of the incomes of individuals in a sample might be grouped into ranges of 0–10,000, 10,001–30,000, etc., with counts of the numbers of individuals in each group being the statistics that are published.
Frequency distributions are depicted as a table or as a graph, one form of which is a histogram. Quantities that summarize a distribution (summary statistics) are:
- measures of central tendency, representing a "typical value" for a member of a population. The three major types are the mean, the median, and the mode.
- measures of dispersion, which describe how different members of a sample are among themselves. Common measures of dispersion include the median absolute deviation, the standard deviation, and the interquartile range.
- measures of shape of the distribution, such as skewness.
- measures aimed at describing the most unusual members of a population, such the minumum and maximum values observed, or sample quantiles.
Comparison of populations
When several different populations are to compared, the above graphs and summary statistics can be used. One of the main aims of descriptive statistics is to facilitate the comparison of different populations.
Multivariate analysis
Multivariate analysis arises when more than one variable is measured for each member of a population. In such cases, the above univariate analyses applied to each variable separately are supplemented and extended. The main extra consideration here is that of association: the way in which the values of one subset of variables within a populations are related to other subsets. Descriptive statistics makes use of:
- Cross-tabulations and contingency tables
- scatterplots
- quantitative measures of correlation and dependence such as Pearson's correlation or Spearman's rank correlation coefficient
- descriptions of conditional distributions
See also
This article includes a list of general references, but it lacks sufficient corresponding inline citations. (July 2010) |
Notes
- ^ (1995) Introductory Statistics, 2nd Edition, Wiley. ISBN 0-471-31009-3
- ^ Dodge, Y (2003) The Oxford Dictionary of Statistical Terms OUP. ISBN 0-19-850994-4
- ^ Trochim, William M. K. (2006). "Descriptive statistics". Research Methods Knowledge Base. Retrieved 14 March 2011.
External links
- Descriptive Statistics Lecture: University of Pittsburgh Supercourse: http://www.pitt.edu/~super1/lecture/lec0421/index.htm