Chain rule (probability): Difference between revisions

Content deleted Content added

Inline

Revision as of 01:14, 4 November 2022

In probability theory, the chain rule (also called the general product rule^[1]^[2]) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities. The rule is useful in the study of Bayesian networks, which describe a probability distribution in terms of conditional probabilities.

Chain rule for events

Two events

The chain rule for two random events $A$ and $B$ says $P(A\cap B)=P(B\mid A)\cdot P(A).$

Example

This rule is illustrated in the following example. Urn 1 has 1 black ball and 2 white balls and Urn 2 has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event $A$ be choosing the first urn: $P(A)=P({\overline {A}})=1/2.$ Let event $B$ be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is $P(B|A)=2/3.$ Event $A\cap B$ would be their intersection: choosing the first urn and a white ball from it. The probability can be found by the chain rule for probability: $\mathrm {P} (A\cap B)=\mathrm {P} (B\mid A)\mathrm {P} (A)=2/3\times 1/2=1/3.$

More than two events

For more than two events $A_{1},\ldots ,A_{n}$ the chain rule extends to the formula $\mathrm {P} \left(A_{n}\cap \ldots \cap A_{1}\right)=\mathrm {P} \left(A_{n}|A_{n-1}\cap \ldots \cap A_{1}\right)\cdot \mathrm {P} \left(A_{n-1}\cap \ldots \cap A_{1}\right)$ which by induction may be turned into $\mathrm {P} \left(A_{n}\cap \ldots \cap A_{1}\right)=\prod _{k=1}^{n}\mathrm {P} \left(A_{k}\,{\Bigg |}\,\bigcap _{j=1}^{k-1}A_{j}\right).$

Example

With four events ( $n=4$ ), the chain rule is ${\begin{aligned}\mathrm {P} (A_{1}\cap A_{2}\cap A_{3}\cap A_{4})&=\mathrm {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\cdot \mathrm {P} (A_{3}\cap A_{2}\cap A_{1})\\&=\mathrm {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\cdot \mathrm {P} (A_{3}\mid A_{2}\cap A_{1})\cdot \mathrm {P} (A_{2}\cap A_{1})\\&=\mathrm {P} (A_{4}\mid A_{3}\cap A_{2}\cap A_{1})\cdot \mathrm {P} (A_{3}\mid A_{2}\cap A_{1})\cdot \mathrm {P} (A_{2}\mid A_{1})\cdot \mathrm {P} (A_{1})\end{aligned}}$

Chain rule for random variables

Two random variables

For two random variables $X,Y$ , to find the joint distribution, we can apply the definition of conditional probability to obtain: $\mathrm {P} (X=x,Y=y)=\mathrm {P} (X=x\mid Y=y)\cdot \mathrm {P} (Y=y)$ for any possible values $x$ of $X$ and $y$ of $Y$ in the discrete case or, in general, $\mathrm {P} (X\in A,Y\in B)=\mathrm {P} (X\in A\mid Y\in B)\cdot \mathrm {P} (Y\in B)$ for any possible measurable sets $A$ and $B$ .

If one desires a notation for the probability distribution of $X$ , one can use $P_{X}$ , so that $P_{X}(x):=P(X=x)$ in the discrete case or, in general, $P_{X}(A):=P(X\in A)$ for a measurable set $A$ .

Note: in the examples below, it is meaningless to write $P(X)$ for a single random variable $X$ or multiple random variables. We have left them as an earlier editor wrote them to provide an example to warn against this incomplete notation. It is particularly egregious to write intersections of random variables.

More than two random variables

Consider an indexed collection of random variables $X_{1},\ldots ,X_{n}$ . To find the value of this member of the joint distribution, we can apply the definition of conditional probability to obtain: $\mathrm {P} \left(X_{n},\ldots ,X_{1}\right)=\mathrm {P} \left(X_{n}|X_{n-1},\ldots ,X_{1}\right)\cdot \mathrm {P} \left(X_{n-1},\ldots ,X_{1}\right)$ Repeating this process with each final term creates the product: $\mathrm {P} \left(\bigcap _{k=1}^{n}X_{k}\right)=\prod _{k=1}^{n}\mathrm {P} \left(X_{k}\,{\Bigg |}\,\bigcap _{j=1}^{k-1}X_{j}\right).$

Example

With four variables ( $n=4$ ), the chain rule produces this product of conditional probabilities: ${\begin{aligned}\mathrm {P} (X_{4},X_{3},X_{2},X_{1})&=\mathrm {P} (X_{4}\mid X_{3},X_{2},X_{1})\cdot \mathrm {P} (X_{3},X_{2},X_{1})\\&=\mathrm {P} (X_{4}\mid X_{3},X_{2},X_{1})\cdot \mathrm {P} (X_{3}\mid X_{2},X_{1})\cdot \mathrm {P} (X_{2},X_{1})\\&=\mathrm {P} (X_{4}\mid X_{3},X_{2},X_{1})\cdot \mathrm {P} (X_{3}\mid X_{2},X_{1})\cdot \mathrm {P} (X_{2}\mid X_{1})\cdot \mathrm {P} (X_{1})\end{aligned}}$

References

^ Schum, David A. (1994). The Evidential Foundations of Probabilistic Reasoning. Northwestern University Press. p. 49. ISBN 978-0-8101-1821-8.
^ Klugh, Henry E. (2013). Statistics: The Essentials for Research (3rd ed.). Psychology Press. p. 149. ISBN 978-1-134-92862-0.

Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2, p. 496.
"The Chain Rule of Probability", developerWorks, Nov 3, 2012.

[1] Schum, David A. (1994). The Evidential Foundations of Probabilistic Reasoning. Northwestern University Press. p. 49. ISBN 978-0-8101-1821-8.

[2] Klugh, Henry E. (2013). Statistics: The Essentials for Research (3rd ed.). Psychology Press. p. 149. ISBN 978-1-134-92862-0.

[1]

[2]

@@ Line 35: / Line 35: @@
 For two random variables <math>X,Y</math>, to find the joint distribution, we can apply the definition of conditional probability to obtain:
-<math display=block>\mathrm P(X,Y) = \mathrm P(X \mid Y) \cdot \mathrm P(Y).</math>
+<math display=block>\mathrm P(X = x,Y = y) = \mathrm P(X = x\mid Y = y) \cdot \mathrm P(Y = y)</math>
+for any possible values <math>x</math> of <math>X</math> and <math>y</math> of <math>Y</math> in the discrete case or, in general,
+<math display=block>\mathrm P(X \in A,Y \in B) = \mathrm P(X \in A\mid Y \in B) \cdot \mathrm P(Y \in B)</math>
+for any possible measurable sets <math>A</math> and <math>B</math>.
+If one desires a notation for the probability distribution of <math>X</math>, one can use <math>P_X</math>, so that <math>P_X(x) := P(X = x)</math> in the discrete case or, in general, <math>P_X(A) := P(X \in A)</math> for a measurable set <math>A</math>.
+'''Note: in the examples below, it is meaningless to write <math>P(X)</math> for a single random variable <math>X</math> or multiple random variables. We have left them as an earlier editor wrote them to provide an example to warn against this incomplete notation. It is particularly egregious to write intersections of random variables.'''
 ===More than two random variables===

Revision as of 01:14, 4 November 2022

Chain rule for events

Two events

Example

More than two events

Example

Chain rule for random variables

Two random variables

More than two random variables

Example

See also

References