Joint probability distribution: Difference between revisions

Content deleted Content added

Inline

Revision as of 06:47, 15 November 2012

In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. The equation for joint probability is different for both dependent and independent events.

The joint probability function of a set of variables can be used to find a variety of other probability distributions. The probability density function can be found by taking a partial derivative of the joint distribution with respect to each of the variables. A marginal density ("marginal distribution" in the discrete case) is found by integrating (or summing in the discrete case) over the domain of one of the other variables in the joint distribution. A conditional probability distribution can be calculated by taking the joint density and dividing it by the marginal density of one (or more) of the variables.

Example

Consider the roll of a die and let $A=1$ if the number is even (i.e.; 2,4, or 6) and $A=0$ otherwise. Furthermore, let $B=1$ if the number is prime (i.e.; 2, 3 or 5) and $B=0$ otherwise. Then, the joint distribution of $A$ and $B$ is

\mathrm {P} (A=0,B=0)=P\{1\}={\frac {1}{6}},\;\mathrm {P} (A=1,B=0)=P\{4,6\}={\frac {2}{6}}

\mathrm {P} (A=0,B=1)=P\{3,5\}={\frac {2}{6}},\;\mathrm {P} (A=1,B=1)=P\{2\}={\frac {1}{6}}

Cumulative distribution

The cumulative distribution function for a pair of random variables is defined in terms of their joint probability distribution;

F(x,y)=P(X\leq x,Y\leq y).

where our terms are defined such that...

Discrete case

The joint probability mass function of two discrete random variables is equal to

{\begin{aligned}\mathrm {P} (X=x\ \mathrm {and} \ Y=y)&{}=\mathrm {P} (Y=y\mid X=x)\cdot \mathrm {P} (X=x)\\&{}=\mathrm {P} (X=x\mid Y=y)\cdot \mathrm {P} (Y=y).\end{aligned}}

In general, the joint probability distribution of $n\,$ discrete random variables $X_{1},X_{2},\dots ,X_{n}$ is equal to

{\begin{aligned}\mathrm {P} (X_{1}=x_{1},\dots ,X_{n}=x_{n})&=\mathrm {P} (X_{1}=x_{1})\times \\&\qquad \times \mathrm {P} (X_{2}=x_{2}|X_{1}=x_{1})\times \\&\quad \qquad \times \mathrm {P} (X_{3}=x_{3}|X_{1}=x_{1},X_{2}=x_{2})\times \dots \times P(X_{n}=x_{n}|X_{1}=x_{1},X_{2}=x_{2},\dots ,X_{n-1}=x_{n-1})\end{aligned}}

This identity is known as the chain rule of probability.

Since these are probabilities, we have

\sum _{x}\sum _{y}\mathrm {P} (X=x\ \mathrm {and} \ Y=y)=1.\;

generalizing for $n\,$ discrete random variables $X_{1},X_{2},\dots ,X_{n}$

\sum _{x_{1}}\sum _{x_{2}}\dots \sum _{x_{n}}\mathrm {P} (X_{1}=x_{1},X_{2}=x_{2},\dots ,X_{n}=x_{n})=1.\;

Continuous case

Similarly for continuous random variables, the joint probability density function can be written as f_X,Y(x, y) and this is

f_{X,Y}(x,y)=f_{Y|X}(y|x)f_{X}(x)=f_{X|Y}(x|y)f_{Y}(y)\;

where f_Y|X(y|x) and f_X|Y(x|y) give the conditional distributions of Y given X = x and of X given Y = y respectively, and f_X(x) and f_Y(y) give the marginal distributions for X and Y respectively.

Again, since these are probability distributions, one has

\int _{x}\int _{y}f_{X,Y}(x,y)\;dy\;dx=1.

Mixed case

In some situations X is continuous but Y is discrete. For example, in a logistic regression, one may wish to predict the probability of a binary outcome Y conditional on the value of a continuously distributed X. In this case, (X, Y) has neither a probability density function nor a probability mass function in the sense of the terms given above. On the other hand, a "mixed joint density" can be defined in either of two ways:

{\begin{aligned}f_{X,Y}(x,y)&=f_{X|Y}(x|y)\mathrm {P} (Y=y)\\&=\mathrm {P} (Y=y\mid X=x)f_{X}(x)\end{aligned}}

Formally, f_X,Y(x, y) is the probability density function of (X, Y) with respect to the product measure on the respective supports of X and Y. Either of these two decompositions can then be used to recover the joint cumulative distribution function:

{\begin{aligned}F_{X,Y}(x,y)&=\sum \limits _{t\leq y}\int _{s=-\infty }^{x}f_{X,Y}(s,t)\;ds\end{aligned}}

The definition generalizes to a mixture of arbitrary numbers of discrete and continuous random variables.

General multidimensional distributions

Remember that the cumulative distribution function for a vector of random variables is defined in terms of their joint probability distribution;

F(x_{1},\dots ,x_{n})=P(X_{1}\leq x_{1},\dots ,X_{n}\leq x_{n}).

The joint distribution for two random variables can be extended to many random variables X₁, ... X_n by adding them sequentially with the identity

{\begin{aligned}f_{X_{1},\ldots X_{n}}(x_{1},\ldots x_{n})=&f_{X_{n}|X_{1},\ldots X_{n-1}}(x_{n}|x_{1},\ldots x_{n-1})f_{X_{1},\ldots X_{n-1}}(x_{1},\ldots x_{n-1})\\=&f_{X_{1}}(x_{1})\\&\cdot f_{X_{2}|X_{1}}(x_{2}|x_{1})\\&\cdot \dots \\&\cdot f_{X_{n-1}|X_{1}\ldots X_{n-2}}(x_{n-1}|x_{1},\ldots x_{n-2})\\&\cdot f_{X_{n}|X_{1},\ldots X_{n-1}}(x_{n}|x_{1},\ldots x_{n-1}),\end{aligned}}

where

{\begin{aligned}f_{X_{i}|X_{1},\ldots X_{i-1}}(x_{i}|x_{1},\ldots x_{i-1})=&{\frac {f_{X_{1},\dots X_{i}}(x_{1},\dots x_{i})}{\int f_{X_{1},\dots X_{i}}(x_{1},\dots x_{i-1},u_{i})\mathrm {d} u_{i}}}\\=&{\frac {\int \dots \int f_{X_{1},\dots X_{n}}(x_{1},\dots x_{i},u_{i+1},\dots u_{n})\mathrm {d} u_{i+1}\dots \mathrm {d} u_{n}}{\int \dots \int \int f_{X_{1},\dots X_{n}}(x_{1},\dots x_{i-1},u_{i},\dots u_{n})\mathrm {d} u_{i}\,\mathrm {d} u_{i+1}\dots \mathrm {d} u_{n}}}\end{aligned}}

and

f_{X_{1},\dots X_{i}}(x_{1},\dots x_{i})=\int \dots \int f_{X_{1},\dots X_{n}}(x_{1},\dots x_{i},x_{i+1},\dots x_{n})\mathrm {d} x_{i+1}\dots \mathrm {d} x_{n}

(notice, that these latter identities can be useful to generate a random variable $(X_{1},\dots X_{n})$ with given distribution function $f(x_{1},\dots x_{n})$ ); the density of the marginal distribution is

f_{X_{i}}(x_{i})=\int \dots \int \int \dots \int f_{X_{1},\dots X_{n}}(x_{1},\dots x_{i-1},x_{i},x_{i+1},\dots x_{n})\mathrm {d} x_{1}\dots \mathrm {d} x_{i-1}\,\mathrm {d} x_{i+1}\dots \mathrm {d} x_{n}.

The joint cumulative distribution function is

F_{X_{1},\dots X_{n}}\left(x_{1},\dots x_{n}\right)=\int _{-\infty }^{x_{1}}\dots \int _{-\infty }^{x_{n}}f_{X_{1},\dots X_{n}}\left(u_{1},\dots u_{n}\right)\mathrm {d} u_{1}\dots \mathrm {d} u_{n},

and the conditional distribution function is accordingly

{\begin{aligned}F_{X_{i}|X_{1},\ldots X_{i-1}}(x_{i}|x_{1},\ldots x_{i-1})=&{\frac {\int _{-\infty }^{x_{i}}f_{X_{1},\dots X_{i}}(x_{1},\dots x_{i-1},u_{i})\mathrm {d} u_{i}}{\int _{-\infty }^{\infty }f_{X_{1},\dots X_{i}}(x_{1},\dots x_{i-1},u_{i})\mathrm {d} u_{i}}}\\=&{\frac {\int _{-\infty }^{\infty }\dots \int _{-\infty }^{\infty }\int _{-\infty }^{x_{i}}f_{X_{1},\dots X_{n}}(x_{1},\dots x_{i-1},u_{i},\dots u_{n})\mathrm {d} u_{i}\dots \mathrm {d} u_{n}}{\int _{-\infty }^{\infty }\dots \int _{-\infty }^{\infty }\int _{-\infty }^{\infty }f_{X_{1},\dots X_{n}}(x_{1},\dots x_{i-1},u_{i},\dots u_{n})\mathrm {d} u_{i}\dots \mathrm {d} u_{n}}}.\end{aligned}}

Expectation reads

\mathbb {E} \left[h(X_{1},\dots X_{n})\right]=\int _{-\infty }^{\infty }\dots \int _{-\infty }^{\infty }h(x_{1},\dots x_{n})f_{X_{1},\dots X_{n}}(x_{1},\dots x_{n})\mathrm {d} x_{1}\dots \mathrm {d} x_{n};

suppose that h is smooth enough and $h(u_{1},\dots u_{n})=h(x_{1},\dots x_{n})$ for $u_{1}\geq x_{1},\dots u_{n}\geq x_{n}$ , then, by iterated integration by parts,

{\begin{aligned}\mathbb {E} \left[h(X_{1},\dots X_{n})\right]=&h(x_{1},\dots x_{n})+\\&(-1)^{n}\int _{-\infty }^{x_{1}}\dots \int _{-\infty }^{x_{n}}F_{X_{1},\dots X_{n}}(u_{1},\dots u_{n}){\frac {\partial ^{n}}{\partial x_{1}\dots \partial x_{n}}}h(u_{1},\dots u_{n})\mathrm {d} u_{1}\dots \mathrm {d} u_{n}.\end{aligned}}

Joint distribution for independent variables

If for discrete random variables $\ P(X=x\ {\mbox{and}}\ Y=y)=P(X=x)\cdot P(Y=y)$ for all x and y, or for absolutely continuous random variables $\ f_{X,Y}(x,y)=f_{X}(x)\cdot f_{Y}(y)$ for all x and y, then X and Y are said to be independent.

Joint distribution for conditionally dependent variables

If a subset $A$ of the variables $X_{1},\cdots ,X_{n}$ is conditionally dependent given another subset $B$ of these variables, then the joint distribution $\mathrm {P} (X_{1},...,X_{n})$ is equal to $P(B)\cdot P(A|B)$ . Therefore, it can be efficiently represented by the lower-dimensional probability distributions $P(B)$ and $P(A|B)$ . Such conditional independence relations can be represented with a Bayesian network.

External links

@@ Line 122: / Line 122: @@
 <!-- If for discrete random variables<sub>''Y''</sub>(''y'') for all ''x'' and ''y'', then ''X'' and ''Y'' are said to be [[statistical independence|independent]]. -->
-==Joint distribution for conditionally independent variables==
+==Joint distribution for conditionally dependent variables==
-If a subset <math>A</math> of the variables <math>X_1,\cdots,X_n</math> is [[conditional independence|conditionally independent]] given another subset <math>B</math> of these variables, then the joint distribution <math>\mathrm{P}(X_1,...,X_n)</math> is equal to <math>P(B)\cdot P(A|B)</math>. Therefore, it can be efficiently represented by the lower-dimensional probability distributions <math>P(B)</math> and <math>P(A|B)</math>. Such conditional independence relations can be represented with a [[Bayesian network]].
+If a subset <math>A</math> of the variables <math>X_1,\cdots,X_n</math> is [[conditional dependence|conditionally dependent]] given another subset <math>B</math> of these variables, then the joint distribution <math>\mathrm{P}(X_1,...,X_n)</math> is equal to <math>P(B)\cdot P(A|B)</math>. Therefore, it can be efficiently represented by the lower-dimensional probability distributions <math>P(B)</math> and <math>P(A|B)</math>. Such conditional independence relations can be represented with a [[Bayesian network]].
 ==See also==