Law of total expectation: Difference between revisions
LiranKatzir (talk | contribs) m A link addition to relevant new page. |
m Updated location of website for Adam and Eve's law |
||
(160 intermediate revisions by 83 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Proposition in probability theory}} |
|||
The proposition in [[probability theory]] known as the '''law of total expectation''' <ref>Neil A. Weiss, ''A Course in Probability'', Addison–Wesley, 2005, pages 380–383.</ref>, the '''law of iterated expectations''', '''Adam's law''', the '''tower rule''', the '''smoothing theorem''', among other names, states that if ''X'' is an integrable [[random variable]] (i.e., a random variable satisfying E( | ''X'' | ) < ∞) and ''Y'' is any random variable, not necessarily integrable, on the same [[probability space]], then |
|||
The proposition in [[probability theory]] known as the '''law of total expectation''',<ref>{{cite book |first=Neil A. |last=Weiss |title=A Course in Probability |location=Boston |publisher=Addison–Wesley |year=2005 |isbn=0-321-18954-X |url={{Google books |plainurl=yes |id=p-rwJAAACAAJ |page=380 }} |pages=380–383 }}</ref> the '''law of iterated expectations'''<ref>{{Cite web|url=https://brilliant.org/wiki/law-of-iterated-expectation/|title=Law of Iterated Expectation {{!}} Brilliant Math & Science Wiki|website=brilliant.org|language=en-us|access-date=2018-03-28}}</ref> ('''LIE'''), '''Adam's law''',<ref>{{cite web |date=2024-09-15 |title=Adam's and Eve's Laws |url=https://rsconnect3.amherst.edu/content/00d145e1-889f-4777-94b4-b0141887be12 |access-date=2022-09-15 |website=Adam and Eve's laws (Shiny app)}}</ref> the '''tower rule''',<ref>{{Cite web|url=https://web.stanford.edu/class/cme001/handouts/changhan/Refresher2.pdf|title=Probability and Statistics|last=Rhee|first=Chang-han|date=Sep 20, 2011}}</ref> and the '''smoothing theorem''',<ref>{{Cite web|url=https://www2.stat.duke.edu/courses/Fall10/sta205/lec/topics/rn.pdf|title=Conditional Expectation|last=Wolpert|first=Robert|date=November 18, 2010}}</ref> among other names, states that if <math>X</math> is a [[random variable]] whose expected value <math>\operatorname{E}(X)</math> is defined, and <math>Y</math> is any random variable on the same [[probability space]], then |
|||
:<math>\operatorname{E} (X) = \operatorname{E} |
:<math>\operatorname{E} (X) = \operatorname{E} ( \operatorname{E} ( X \mid Y)),</math> |
||
i.e., the [[expected value]] of the conditional expected value of |
i.e., the [[expected value]] of the [[conditional expected value]] of <math>X</math> given <math>Y</math> is the same as the expected value of <math>X</math>. |
||
The [[conditional expected value]] <math>\operatorname{E}( X \mid Y )</math>, with <math>Y</math> a random variable, is not a simple number; it is a random variable whose value depends on the value of <math>Y</math>. That is, the conditional expected value of <math>X</math> given the ''event'' <math>Y = y</math> is a number and it is a function of <math>y</math>. If we write <math>g(y)</math> for the value of <math>\operatorname{E} ( X \mid Y = y) </math> then the random variable <math>\operatorname{E}( X \mid Y )</math> is <math> g( Y ) </math>. |
|||
The nomenclature used here parallels the phrase ''[[law of total probability]]''. See also [[law of total variance]]. |
|||
One special case states that if <math>{\left\{A_i\right\}}</math> is a finite or [[countable set|countable]] [[partition of a set|partition]] of the [[sample space]], then |
|||
(The [[conditional expected value]] E( ''X'' | ''Y'' ) is a random variable in its own right, whose value depends on the value of ''Y''. Notice that the conditional expected value of ''X'' given the ''event'' ''Y'' = ''y'' is a function of ''y'' (this is where adherence to the conventional rigidly case-sensitive [[notation in probability|notation of probability theory]] becomes important!). If we write E( ''X'' | ''Y'' = ''y'') = ''g''(''y'') then the random variable E( ''X'' | ''Y'' ) is just ''g''(''Y''). |
|||
⚫ | |||
==Example== |
|||
Suppose that only two factories supply [[light bulb]]s to the market. Factory <math>X</math><nowiki>'</nowiki>s bulbs work for an average of 5000 hours, whereas factory <math>Y</math><nowiki>'</nowiki>s bulbs work for an average of 4000 hours. It is known that factory <math>X</math> supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for? |
|||
Applying the law of total expectation, we have: |
|||
: <math>\begin{align} |
|||
⚫ | |||
&= 5000(0.6)+4000(0.4)\\[2pt] |
|||
&=4600 |
|||
\end{align}</math> |
|||
where |
|||
* <math>\operatorname{E} (L)</math> is the expected life of the bulb; |
|||
* <math>\operatorname{P}(X)={6 \over 10}</math> is the probability that the purchased bulb was manufactured by factory <math>X</math>; |
|||
* <math>\operatorname{P}(Y)={4 \over 10}</math> is the probability that the purchased bulb was manufactured by factory <math>Y</math>; |
|||
* <math>\operatorname{E}(L \mid X)=5000</math> is the expected lifetime of a bulb manufactured by <math>X</math>; |
|||
* <math>\operatorname{E}(L \mid Y)=4000</math> is the expected lifetime of a bulb manufactured by <math>Y</math>. |
|||
Thus each purchased light bulb has an expected lifetime of 4600 hours. |
|||
==Informal proof== |
|||
When a joint [[probability density function]] is [[well defined]] and the expectations are [[integrable function|integrable]], we write for the general case |
|||
<math display="block">\begin{align} \operatorname E(X) &= \int x \Pr[X=x] ~dx \\ |
|||
\operatorname E(X\mid Y=y) &= \int x \Pr[X=x\mid Y=y] ~dx \\ |
|||
\operatorname E( \operatorname E(X\mid Y)) &= \int \left(\int x \Pr[X=x\mid Y=y] ~dx \right) \Pr[Y=y] ~dy \\ |
|||
&= \int \int x \Pr[X = x, Y= y] ~dx ~dy \\ |
|||
&= \int x \left( \int \Pr[X = x, Y = y] ~dy \right) ~dx \\ |
|||
&= \int x \Pr[X = x] ~dx \\ |
|||
⚫ | |||
A similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable ''Y'' be the function of the sample space that assigns a cell's label to each point in that cell. |
|||
⚫ | |||
Let <math> (\Omega,\mathcal{F},\operatorname{P}) </math> be a probability space on which two sub [[Sigma-algebra|σ-algebras]] <math> \mathcal{G}_1 \subseteq \mathcal{G}_2 \subseteq \mathcal{F} </math> are defined. For a random variable <math> X </math> on such a space, the smoothing law states that if <math>\operatorname{E}[X]</math> is defined, i.e. |
|||
<math>\min(\operatorname{E}[X_+], \operatorname{E}[X_-])<\infty</math>, then |
|||
:<math> \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] = \operatorname{E}[X \mid \mathcal{G}_1]\quad\text{(a.s.)}.</math> |
|||
'''Proof'''. Since a conditional expectation is a [[Radon–Nikodym theorem|Radon–Nikodym derivative]], verifying the following two properties establishes the smoothing law: |
|||
* <math> \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \mbox{ is } \mathcal{G}_1</math>-[[measurable]] |
|||
* <math> \int_{G_1} \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \, d\operatorname{P} = \int_{G_1} X \, d\operatorname{P},</math> for all <math>G_1 \in \mathcal{G}_1.</math> |
|||
The first of these properties holds by definition of the conditional expectation. To prove the second one, |
|||
⚫ | |||
:<math> |
:<math> |
||
\begin{align} |
\begin{align} |
||
\operatorname{ |
\min\left(\int_{G_1}X_+\, d\operatorname{P}, \int_{G_1}X_-\, d\operatorname{P} \right) &\leq \min\left(\int_\Omega X_+\, d\operatorname{P}, \int_\Omega X_-\, d\operatorname{P}\right)\\[4pt] |
||
& |
&=\min(\operatorname{E}[X_+], \operatorname{E}[X_-]) < \infty, |
||
&{}=\sum_y \sum_x x \cdot \operatorname{P}(X=x|Y=y) \cdot \operatorname{P}(Y=y) \\[6pt] |
|||
⚫ | |||
&{}=\sum_x x \sum_y \operatorname{P}(X=x, Y=y) \\[6pt] |
|||
&{}=\sum_x x \cdot \operatorname{P}(X=x) \\[6pt] |
|||
⚫ | |||
\end{align} |
\end{align} |
||
</math> |
</math> |
||
so the integral <math>\textstyle \int_{G_1}X\, d\operatorname{P}</math> is defined (not equal <math>\infty - \infty</math>). |
|||
==Iterated expectations with nested conditioning sets== |
|||
The following formulation of the '''law of iterated expectations''' plays an important role in many economic and finance models: |
|||
The second property thus holds since |
|||
⚫ | |||
<math>G_1 \in \mathcal{G}_1 \subseteq \mathcal{G}_2 </math> implies |
|||
:<math> |
|||
\int_{G_1} \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \, d\operatorname{P} |
|||
= \int_{G_1} \operatorname{E}[X \mid \mathcal{G}_2] \, d\operatorname{P} |
|||
= \int_{G_1} X \, d\operatorname{P}. |
|||
</math> |
|||
'''Corollary.''' In the special case when <math>\mathcal{G}_1 = \{\empty,\Omega \}</math> and <math>\mathcal{G}_2 = \sigma(Y)</math>, the smoothing law reduces to |
|||
where the value of ''I''<sub>1</sub> is determined by that of ''I''<sub>2</sub>. To build intuition, imagine an investor who forecasts a random stock price ''X'' based on the limited information set ''I''<sub>1</sub>. The law of iterated expectations says that the investor can never gain a more precise forecast of ''X'' by conditioning on more specific information (''I''<sub>2</sub>), ''if'' the more specific forecast must itself be forecast with the original information (''I''<sub>1</sub>). |
|||
:<math> |
|||
\operatorname{E}[ \operatorname{E}[X \mid Y]] = \operatorname{E}[X]. |
|||
</math> |
|||
'''Alternative proof for <math> \operatorname{E}[ \operatorname{E}[X \mid Y]] = \operatorname{E}[X].</math>''' |
|||
This formulation is often applied in a [[time series]] context, where E<sub>''t''</sub> denotes expectations conditional on only the information observed up to and including time period ''t''. In typical models the information set ''t'' + 1 contains all information available through time ''t'', plus additional information revealed at time ''t'' + 1. One can then write<ref>{{cite book|title=Recursive macroeconomic theory|authors=Lars Ljungqvist, Thomas J. Sargent|isbn=978-0-262-12274-0|url=http://books.google.com/books?id=Xx-j-tYaPQUC&printsec=frontcover&dq=recursive+macro+theory#v=snippet&q=recursions%20on%20equation&f=false}}</ref>: |
|||
This is a simple consequence of the measure-theoretic definition of [[conditional expectation]]. By definition, <math> \operatorname{E}[X \mid Y] := \operatorname{E}[X \mid \sigma(Y)] </math> is a <math>\sigma(Y)</math>-measurable random variable that satisfies |
|||
⚫ | |||
:<math> |
|||
⚫ | |||
</math> |
|||
for every measurable set <math> A \in \sigma(Y) </math>. Taking <math> A = \Omega </math> proves the claim. |
|||
==See also== |
==See also== |
||
* The [[fundamental theorem of poker]] for one practical application. |
|||
* [[Law of total probability]] |
* [[Law of total probability]] |
||
* [[ |
* [[Law of total variance]] |
||
* [[Law of total covariance]] |
|||
* [[Law of total cumulance]] |
|||
* [[Product distribution#expectation]] (application of the Law for proving that the product expectation is the product of expectations) |
|||
==References== |
==References== |
||
{{Reflist}} |
{{Reflist}} |
||
*{{cite book | last=Billingsley | first=Patrick | title=Probability and measure | publisher=John Wiley & Sons |
*{{cite book | last=Billingsley | first=Patrick | title=Probability and measure | publisher=John Wiley & Sons | location=New York | year=1995 | isbn=0-471-00710-2}} (Theorem 34.4) |
||
*http://sims.princeton.edu/yftp/Bubbles2007/ProbNotes.pdf, especially equations (16) through (18) |
*[[Christopher Sims]], [http://sims.princeton.edu/yftp/Bubbles2007/ProbNotes.pdf "Notes on Random Variables, Expectations, Probability Densities, and Martingales"], especially equations (16) through (18) |
||
{{DEFAULTSORT:Law Of Total Expectation}} |
{{DEFAULTSORT:Law Of Total Expectation}} |
||
Line 47: | Line 105: | ||
[[Category:Theory of probability distributions]] |
[[Category:Theory of probability distributions]] |
||
[[Category:Statistical laws]] |
[[Category:Statistical laws]] |
||
[[it:Legge delle aspettative iterate]] |
|||
[[he:משפט ההחלקה]] |
|||
[[tr:Toplam beklenti yasası]] |
|||
[[zh:全期望公式]] |
Latest revision as of 19:42, 15 September 2024
The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations[2] (LIE), Adam's law,[3] the tower rule,[4] and the smoothing theorem,[5] among other names, states that if is a random variable whose expected value is defined, and is any random variable on the same probability space, then
i.e., the expected value of the conditional expected value of given is the same as the expected value of .
The conditional expected value , with a random variable, is not a simple number; it is a random variable whose value depends on the value of . That is, the conditional expected value of given the event is a number and it is a function of . If we write for the value of then the random variable is .
One special case states that if is a finite or countable partition of the sample space, then
Example
[edit]Suppose that only two factories supply light bulbs to the market. Factory 's bulbs work for an average of 5000 hours, whereas factory 's bulbs work for an average of 4000 hours. It is known that factory supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?
Applying the law of total expectation, we have:
where
- is the expected life of the bulb;
- is the probability that the purchased bulb was manufactured by factory ;
- is the probability that the purchased bulb was manufactured by factory ;
- is the expected lifetime of a bulb manufactured by ;
- is the expected lifetime of a bulb manufactured by .
Thus each purchased light bulb has an expected lifetime of 4600 hours.
Informal proof
[edit]When a joint probability density function is well defined and the expectations are integrable, we write for the general case A similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.
Proof in the general case
[edit]Let be a probability space on which two sub σ-algebras are defined. For a random variable on such a space, the smoothing law states that if is defined, i.e. , then
Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:
- -measurable
- for all
The first of these properties holds by definition of the conditional expectation. To prove the second one,
so the integral is defined (not equal ).
The second property thus holds since implies
Corollary. In the special case when and , the smoothing law reduces to
Alternative proof for
This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition, is a -measurable random variable that satisfies
for every measurable set . Taking proves the claim.
See also
[edit]- The fundamental theorem of poker for one practical application.
- Law of total probability
- Law of total variance
- Law of total covariance
- Law of total cumulance
- Product distribution#expectation (application of the Law for proving that the product expectation is the product of expectations)
References
[edit]- ^ Weiss, Neil A. (2005). A Course in Probability. Boston: Addison–Wesley. pp. 380–383. ISBN 0-321-18954-X.
- ^ "Law of Iterated Expectation | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2018-03-28.
- ^ "Adam's and Eve's Laws". Adam and Eve's laws (Shiny app). 2024-09-15. Retrieved 2022-09-15.
- ^ Rhee, Chang-han (Sep 20, 2011). "Probability and Statistics" (PDF).
- ^ Wolpert, Robert (November 18, 2010). "Conditional Expectation" (PDF).
- Billingsley, Patrick (1995). Probability and measure. New York: John Wiley & Sons. ISBN 0-471-00710-2. (Theorem 34.4)
- Christopher Sims, "Notes on Random Variables, Expectations, Probability Densities, and Martingales", especially equations (16) through (18)