Jump to content

Statistical distance: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
+{{merge to|Divergence (statistics)}}
Line 27: Line 27:
** [[Kullback–Leibler divergence]]
** [[Kullback–Leibler divergence]]
** [[Hellinger distance]]
** [[Hellinger distance]]
** [[Total variation distance]]
** [[Total variation distance]] (sometimes just called "the" statistical distance)
* [[Rényi divergence|Rényi's divergence]]
* [[Rényi divergence|Rényi's divergence]]
* [[Jensen–Shannon divergence]]
* [[Jensen–Shannon divergence]]

Revision as of 11:37, 18 March 2015

In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points.

A distance between populations can be interpreted as measuring the distance between two probability distributions and hence they are essentially measures of distances between probability measures. Where statistical distance measures relate to the differences between random variables, these may have statistical dependence,[1] and hence these distances are not directly related to measures of distances between probability measures. Again, a measure of distance between random variables may relate to the extent of dependence between them, rather than to their individual values.

Statistical distance measures are mostly not metrics and they need not be symmetric. Some types of distance measures are referred to as (statistical) divergences.

Distances as metrics

Metrics

A metric on a set X is a function (called the distance function or simply distance)

d : X × XR+ (where R+ is the set of non-negative real numbers). For all x, y, z in X, this function is required to satisfy the following conditions:

  1. d(x, y) ≥ 0     (non-negativity)
  2. d(x, y) = 0   if and only if   x = y     (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
  3. d(x, y) = d(y, x)     (symmetry)
  4. d(x, z) ≤ d(x, y) + d(y, z)     (subadditivity / triangle inequality).

Generalized metrics

Many statistical distances are not metrics, because they lack one or more properties of proper metrics. For example, pseudometrics can violate the "positive definiteness" (alternatively, "identity of indescernibles" property); quasimetrics can violate the symmetry property; and semimetrics can violate the triangle inequality. Some statistical distances are referred to as divergences.

Examples

Some important statistical distances include the following:

Other approaches

See also

Notes

  1. ^ Dodge, Y. (2003)—entry for distance

References

  • Dodge, Y. (2003) Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9