Law of total expectation: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 19:42, 15 September 2024

The proposition in probability theory known as the law of total expectation,^[1] the law of iterated expectations^[2] (LIE), Adam's law,^[3] the tower rule,^[4] and the smoothing theorem,^[5] among other names, states that if $X$ is a random variable whose expected value $\operatorname {E} (X)$ is defined, and $Y$ is any random variable on the same probability space, then

\operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),

i.e., the expected value of the conditional expected value of $X$ given $Y$ is the same as the expected value of $X$ .

The conditional expected value $\operatorname {E} (X\mid Y)$ , with $Y$ a random variable, is not a simple number; it is a random variable whose value depends on the value of $Y$ . That is, the conditional expected value of $X$ given the event $Y=y$ is a number and it is a function of $y$ . If we write $g(y)$ for the value of $\operatorname {E} (X\mid Y=y)$ then the random variable $\operatorname {E} (X\mid Y)$ is $g(Y)$ .

One special case states that if ${\left\{A_{i}\right\}}$ is a finite or countable partition of the sample space, then

\operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.

Example

Suppose that only two factories supply light bulbs to the market. Factory $X$ 's bulbs work for an average of 5000 hours, whereas factory $Y$ 's bulbs work for an average of 4000 hours. It is known that factory $X$ supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?

Applying the law of total expectation, we have:

{\begin{aligned}\operatorname {E} (L)&=\operatorname {E} (L\mid X)\operatorname {P} (X)+\operatorname {E} (L\mid Y)\operatorname {P} (Y)\\[3pt]&=5000(0.6)+4000(0.4)\\[2pt]&=4600\end{aligned}}

where

$\operatorname {E} (L)$ is the expected life of the bulb;
$\operatorname {P} (X)={6 \over 10}$ is the probability that the purchased bulb was manufactured by factory $X$ ;
$\operatorname {P} (Y)={4 \over 10}$ is the probability that the purchased bulb was manufactured by factory $Y$ ;
$\operatorname {E} (L\mid X)=5000$ is the expected lifetime of a bulb manufactured by $X$ ;
$\operatorname {E} (L\mid Y)=4000$ is the expected lifetime of a bulb manufactured by $Y$ .

Thus each purchased light bulb has an expected lifetime of 4600 hours.

Informal proof

When a joint probability density function is well defined and the expectations are integrable, we write for the general case ${\begin{aligned}\operatorname {E} (X)&=\int x\Pr[X=x]~dx\\\operatorname {E} (X\mid Y=y)&=\int x\Pr[X=x\mid Y=y]~dx\\\operatorname {E} (\operatorname {E} (X\mid Y))&=\int \left(\int x\Pr[X=x\mid Y=y]~dx\right)\Pr[Y=y]~dy\\&=\int \int x\Pr[X=x,Y=y]~dx~dy\\&=\int x\left(\int \Pr[X=x,Y=y]~dy\right)~dx\\&=\int x\Pr[X=x]~dx\\&=\operatorname {E} (X)\,.\end{aligned}}$ A similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.

Proof in the general case

Let $(\Omega ,{\mathcal {F}},\operatorname {P} )$ be a probability space on which two sub σ-algebras ${\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}\subseteq {\mathcal {F}}$ are defined. For a random variable $X$ on such a space, the smoothing law states that if $\operatorname {E} [X]$ is defined, i.e. $\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty$ , then

\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]=\operatorname {E} [X\mid {\mathcal {G}}_{1}]\quad {\text{(a.s.)}}.

Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

$\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]{\mbox{ is }}{\mathcal {G}}_{1}$ -measurable
$\int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]\,d\operatorname {P} =\int _{G_{1}}X\,d\operatorname {P} ,$ for all $G_{1}\in {\mathcal {G}}_{1}.$

The first of these properties holds by definition of the conditional expectation. To prove the second one,

{\begin{aligned}\min \left(\int _{G_{1}}X_{+}\,d\operatorname {P} ,\int _{G_{1}}X_{-}\,d\operatorname {P} \right)&\leq \min \left(\int _{\Omega }X_{+}\,d\operatorname {P} ,\int _{\Omega }X_{-}\,d\operatorname {P} \right)\\[4pt]&=\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty ,\end{aligned}}

so the integral $\textstyle \int _{G_{1}}X\,d\operatorname {P}$ is defined (not equal $\infty -\infty$ ).

The second property thus holds since $G_{1}\in {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}$ implies

\int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]\,d\operatorname {P} =\int _{G_{1}}\operatorname {E} [X\mid {\mathcal {G}}_{2}]\,d\operatorname {P} =\int _{G_{1}}X\,d\operatorname {P} .

Corollary. In the special case when ${\mathcal {G}}_{1}=\{\emptyset ,\Omega \}$ and ${\mathcal {G}}_{2}=\sigma (Y)$ , the smoothing law reduces to

\operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].

Alternative proof for $\operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].$

This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition, $\operatorname {E} [X\mid Y]:=\operatorname {E} [X\mid \sigma (Y)]$ is a $\sigma (Y)$ -measurable random variable that satisfies

\int _{A}\operatorname {E} [X\mid Y]\,d\operatorname {P} =\int _{A}X\,d\operatorname {P} ,

for every measurable set $A\in \sigma (Y)$ . Taking $A=\Omega$ proves the claim.

References

^ Weiss, Neil A. (2005). A Course in Probability. Boston: Addison–Wesley. pp. 380–383. ISBN 0-321-18954-X.
^ "Law of Iterated Expectation | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2018-03-28.
^ "Adam's and Eve's Laws". Adam and Eve's laws (Shiny app). 2024-09-15. Retrieved 2022-09-15.
^ Rhee, Chang-han (Sep 20, 2011). "Probability and Statistics" (PDF).
^ Wolpert, Robert (November 18, 2010). "Conditional Expectation" (PDF).

Billingsley, Patrick (1995). Probability and measure. New York: John Wiley & Sons. ISBN 0-471-00710-2. (Theorem 34.4)
Christopher Sims, "Notes on Random Variables, Expectations, Probability Densities, and Martingales", especially equations (16) through (18)

[1] Weiss, Neil A. (2005). A Course in Probability. Boston: Addison–Wesley. pp. 380–383. ISBN 0-321-18954-X.

[2] "Law of Iterated Expectation | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2018-03-28.

[3] "Adam's and Eve's Laws". Adam and Eve's laws (Shiny app). 2024-09-15. Retrieved 2022-09-15.

[4] Rhee, Chang-han (Sep 20, 2011). "Probability and Statistics" (PDF).

[5] Wolpert, Robert (November 18, 2010). "Conditional Expectation" (PDF).

[1]

[2]

[3]

[4]

[5]

@@ Line 1: / Line 1: @@
+{{Short description|Proposition in probability theory}}
-The proposition in [[probability theory]] known as the '''law of total expectation''' <ref>Neil A. Weiss, ''A Course in Probability'', Addison&ndash;Wesley, 2005, pages 380&ndash;383.</ref>,  the '''law of iterated expectations''', '''Adam's law''', the '''tower rule''', the '''smoothing theorem''', among other names, states that if ''X'' is an integrable [[random variable]] (i.e., a random variable satisfying E( | ''X'' | ) < ∞) and ''Y'' is any random variable, not necessarily integrable, on the same [[probability space]], then
+The proposition in [[probability theory]] known as the '''law of total expectation''',<ref>{{cite book |first=Neil A. |last=Weiss |title=A Course in Probability |location=Boston |publisher=Addison–Wesley |year=2005 |isbn=0-321-18954-X |url={{Google books |plainurl=yes |id=p-rwJAAACAAJ |page=380 }} |pages=380–383 }}</ref> the '''law of iterated expectations'''<ref>{{Cite web|url=https://brilliant.org/wiki/law-of-iterated-expectation/|title=Law of Iterated Expectation {{!}} Brilliant Math & Science Wiki|website=brilliant.org|language=en-us|access-date=2018-03-28}}</ref> ('''LIE'''), '''Adam's law''',<ref>{{cite web |date=2024-09-15 |title=Adam's and Eve's Laws |url=https://rsconnect3.amherst.edu/content/00d145e1-889f-4777-94b4-b0141887be12 |access-date=2022-09-15 |website=Adam and Eve's laws (Shiny app)}}</ref> the '''tower rule''',<ref>{{Cite web|url=https://web.stanford.edu/class/cme001/handouts/changhan/Refresher2.pdf|title=Probability and Statistics|last=Rhee|first=Chang-han|date=Sep 20, 2011}}</ref>  and the '''smoothing theorem''',<ref>{{Cite web|url=https://www2.stat.duke.edu/courses/Fall10/sta205/lec/topics/rn.pdf|title=Conditional Expectation|last=Wolpert|first=Robert|date=November 18, 2010}}</ref> among other names, states that if <math>X</math> is a [[random variable]] whose expected value <math>\operatorname{E}(X)</math> is defined, and <math>Y</math> is any random variable on the same [[probability space]], then
-:<math>\operatorname{E} (X) = \operatorname{E}_y ( \operatorname{E} ( X \mid Y)),</math>
+:<math>\operatorname{E} (X) = \operatorname{E} ( \operatorname{E} ( X \mid Y)),</math>
-i.e., the [[expected value]] of the conditional expected value of ''X'' given ''Y'' is the same as the expected value of ''X''.
+i.e., the [[expected value]] of the [[conditional expected value]] of <math>X</math> given <math>Y</math> is the same as the expected value of <math>X</math>.
+The [[conditional expected value]] <math>\operatorname{E}( X \mid Y )</math>, with <math>Y</math> a random variable, is not a simple number; it is a random variable whose value depends on the value of <math>Y</math>.  That is, the conditional expected value of <math>X</math> given the ''event'' <math>Y = y</math> is a number and it is a function of <math>y</math>. If we write <math>g(y)</math> for the value of <math>\operatorname{E} ( X \mid Y = y) </math> then the random variable <math>\operatorname{E}( X \mid Y )</math> is <math> g( Y ) </math>.
-The nomenclature used here parallels the phrase ''[[law of total probability]]''. See also [[law of total variance]].
+One special case states that if <math>{\left\{A_i\right\}}</math> is a finite or [[countable set|countable]] [[partition of a set|partition]] of the [[sample space]], then
-(The [[conditional expected value]] E( ''X'' | ''Y'' ) is a random variable in its own right, whose value depends on the value of ''Y''.  Notice that the conditional expected value of ''X'' given the ''event''  ''Y'' = ''y'' is a function of ''y'' (this is where adherence to the conventional rigidly case-sensitive [[notation in probability|notation of probability theory]] becomes important!).  If we write E( ''X'' | ''Y'' = ''y'') = ''g''(''y'') then the random variable E( ''X'' | ''Y'' ) is just ''g''(''Y'').
+:<math>\operatorname{E} (X) = \sum_i{\operatorname{E}(X \mid A_i) \operatorname{P}(A_i)}.</math>
+==Example==
+Suppose that only two factories supply [[light bulb]]s to the market. Factory <math>X</math><nowiki>'</nowiki>s bulbs work for an average of 5000 hours, whereas factory <math>Y</math><nowiki>'</nowiki>s bulbs work for an average of 4000 hours. It is known that factory <math>X</math> supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?
+Applying the law of total expectation, we have:
+: <math>\begin{align}
+\operatorname{E} (L) &= \operatorname{E}(L \mid X) \operatorname{P}(X)+\operatorname{E}(L \mid Y) \operatorname{P}(Y) \\[3pt]
+&= 5000(0.6)+4000(0.4)\\[2pt]
+&=4600
+\end{align}</math>
+where
+* <math>\operatorname{E} (L)</math> is the expected life of the bulb;
+* <math>\operatorname{P}(X)={6 \over 10}</math> is the probability that the purchased bulb was manufactured by factory <math>X</math>;
+* <math>\operatorname{P}(Y)={4 \over 10}</math> is the probability that the purchased bulb was manufactured by factory <math>Y</math>;
+* <math>\operatorname{E}(L \mid X)=5000</math> is the expected lifetime of a bulb manufactured by <math>X</math>;
+* <math>\operatorname{E}(L \mid Y)=4000</math> is the expected lifetime of a bulb manufactured by <math>Y</math>.
+Thus each purchased light bulb has an expected lifetime of 4600 hours.
+==Informal proof==
+When a joint [[probability density function]] is [[well defined]] and the expectations are [[integrable function|integrable]], we write for the general case
+<math display="block">\begin{align} \operatorname E(X) &= \int x \Pr[X=x] ~dx \\
+\operatorname E(X\mid Y=y) &= \int x \Pr[X=x\mid Y=y] ~dx \\
+\operatorname E( \operatorname E(X\mid Y)) &= \int \left(\int x \Pr[X=x\mid Y=y] ~dx \right) \Pr[Y=y] ~dy \\
+&= \int \int x \Pr[X = x, Y= y] ~dx ~dy \\
+&= \int x \left( \int \Pr[X = x, Y = y] ~dy \right) ~dx \\
+&= \int x \Pr[X = x] ~dx \\
+&= \operatorname E(X)\,.\end{align}</math>
+A similar derivation works for discrete distributions using summation instead of integration.  For the specific case of a partition, give each cell of the partition a unique label and let the random variable ''Y'' be the function of the sample space that assigns a cell's label to each point in that cell.
+==Proof in the general case==
+Let <math> (\Omega,\mathcal{F},\operatorname{P}) </math> be a probability space on which two sub [[Sigma-algebra|σ-algebras]] <math> \mathcal{G}_1 \subseteq \mathcal{G}_2 \subseteq \mathcal{F} </math> are defined. For a random variable <math> X </math> on such a space, the smoothing law states that if <math>\operatorname{E}[X]</math> is defined, i.e.
+<math>\min(\operatorname{E}[X_+], \operatorname{E}[X_-])<\infty</math>, then
+:<math> \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] = \operatorname{E}[X \mid \mathcal{G}_1]\quad\text{(a.s.)}.</math>
+'''Proof'''. Since a conditional expectation is a [[Radon–Nikodym theorem|Radon–Nikodym derivative]], verifying the following two properties establishes the smoothing law:
+* <math> \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \mbox{ is } \mathcal{G}_1</math>-[[measurable]]
+* <math> \int_{G_1} \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \, d\operatorname{P} = \int_{G_1} X \, d\operatorname{P},</math> for all <math>G_1 \in \mathcal{G}_1.</math>
+The first of these properties holds by definition of the conditional expectation. To prove the second one,
-==Proof in the discrete case==
 :<math>
 \begin{align}
-\operatorname{E}_y \left( \operatorname{E}(X|Y) \right) &{} = \operatorname{E}_y \Bigg[ \sum_x x \cdot \operatorname{P}(X=x|Y) \Bigg] \\[6pt]
+\min\left(\int_{G_1}X_+\, d\operatorname{P}, \int_{G_1}X_-\, d\operatorname{P} \right) &\leq \min\left(\int_\Omega X_+\, d\operatorname{P}, \int_\Omega X_-\, d\operatorname{P}\right)\\[4pt]
-&{}=\sum_y \Bigg[ \sum_x x \cdot \operatorname{P}(X=x|Y=y) \Bigg] \cdot \operatorname{P}(Y=y) \\[6pt]
+&=\min(\operatorname{E}[X_+], \operatorname{E}[X_-]) < \infty,
-&{}=\sum_y \sum_x x \cdot \operatorname{P}(X=x|Y=y) \cdot \operatorname{P}(Y=y) \\[6pt]
-&{}=\sum_x x \sum_y \operatorname{P}(X=x|Y=y) \cdot \operatorname{P}(Y=y) \\[6pt]
-&{}=\sum_x x \sum_y \operatorname{P}(X=x, Y=y) \\[6pt]
-&{}=\sum_x x \cdot \operatorname{P}(X=x) \\[6pt]
-&{}=\operatorname{E}(X).
 \end{align}
 </math>
+so the integral <math>\textstyle \int_{G_1}X\, d\operatorname{P}</math> is defined (not equal <math>\infty - \infty</math>).
-==Iterated expectations with nested conditioning sets==
-The following formulation of the '''law of iterated expectations''' plays an important role in many economic and finance models:
+The second property thus holds since
-:<math>\operatorname{E} (X \mid I_1) = \operatorname{E} ( \operatorname{E} ( X \mid I_2) \mid I_1),</math>
+<math>G_1 \in \mathcal{G}_1 \subseteq \mathcal{G}_2 </math> implies
+:<math>
+   \int_{G_1} \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \, d\operatorname{P}
+= \int_{G_1} \operatorname{E}[X \mid \mathcal{G}_2] \, d\operatorname{P}
+= \int_{G_1} X \, d\operatorname{P}.
+</math>
+'''Corollary.''' In the special case when <math>\mathcal{G}_1 = \{\empty,\Omega \}</math> and <math>\mathcal{G}_2 = \sigma(Y)</math>, the smoothing law reduces to
-where the value of ''I''<sub>1</sub> is determined by that of ''I''<sub>2</sub>.  To build intuition, imagine an investor who forecasts a random stock price ''X'' based on the limited information set ''I''<sub>1</sub>.  The law of iterated expectations says that the investor can never gain a more precise forecast of ''X'' by conditioning on more specific information (''I''<sub>2</sub>), ''if'' the more specific forecast must itself be forecast with the original information (''I''<sub>1</sub>).
+:<math>
+  \operatorname{E}[ \operatorname{E}[X \mid Y]] = \operatorname{E}[X].
+</math>
+'''Alternative proof for <math> \operatorname{E}[ \operatorname{E}[X \mid Y]] = \operatorname{E}[X].</math>'''
-This formulation is often applied in a [[time series]] context, where E<sub>''t''</sub> denotes expectations conditional on only the information observed up to and including time period&nbsp;''t''.  In typical models the information set ''t''&nbsp;+&nbsp;1 contains all information available through time ''t'', plus additional information revealed at time ''t''&nbsp;+&nbsp;1.  One can then write<ref>{{cite book|title=Recursive macroeconomic theory|authors=Lars Ljungqvist, Thomas J. Sargent|isbn=978-0-262-12274-0|url=http://books.google.com/books?id=Xx-j-tYaPQUC&printsec=frontcover&dq=recursive+macro+theory#v=snippet&q=recursions%20on%20equation&f=false}}</ref>:
+This is a simple consequence of the measure-theoretic definition of [[conditional expectation]].  By definition, <math> \operatorname{E}[X \mid Y] := \operatorname{E}[X \mid \sigma(Y)] </math> is a <math>\sigma(Y)</math>-measurable random variable that satisfies
-:<math>\operatorname{E}_t (X) = \operatorname{E}_t ( \operatorname{E}_{t+1} ( X )).</math>
+:<math>
+   \int_A \operatorname{E}[X \mid Y] \, d\operatorname{P} = \int_A X \, d\operatorname{P},
+</math>
+for every measurable set <math> A \in \sigma(Y) </math>. Taking <math> A = \Omega </math> proves the claim.
 ==See also==
+* The [[fundamental theorem of poker]] for one practical application.
 * [[Law of total probability]]
-* [[Rule of Average Conditional Expectations]]
+* [[Law of total variance]]
+* [[Law of total covariance]]
+* [[Law of total cumulance]]
+* [[Product distribution#expectation]] (application of the Law for proving that the product expectation is the product of expectations)
 ==References==
 {{Reflist}}
-*{{cite book | last=Billingsley | first=Patrick | title=Probability and measure | publisher=John Wiley & Sons, Inc | location=New York, NY | year=1995 | isbn=0-471-00710-2}} (Theorem 34.4)
+*{{cite book | last=Billingsley | first=Patrick | title=Probability and measure | publisher=John Wiley & Sons | location=New York | year=1995 | isbn=0-471-00710-2}} (Theorem 34.4)
-*http://sims.princeton.edu/yftp/Bubbles2007/ProbNotes.pdf, especially equations (16) through (18)
+*[[Christopher Sims]], [http://sims.princeton.edu/yftp/Bubbles2007/ProbNotes.pdf "Notes on Random Variables, Expectations, Probability Densities, and Martingales"], especially equations (16) through (18)
 {{DEFAULTSORT:Law Of Total Expectation}}
@@ Line 47: / Line 105: @@
 [[Category:Theory of probability distributions]]
 [[Category:Statistical laws]]
-[[it:Legge delle aspettative iterate]]
-[[he:משפט ההחלקה]]
-[[tr:Toplam beklenti yasası]]
-[[zh:全期望公式]]