Jump to content

User:Slava3087: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Slava3087 (talk | contribs)
No edit summary
please re-add category after renaming article to its proper name
 
(5 intermediate revisions by 2 users not shown)
Line 3: Line 3:




In classical statistical [[decision theory]], where we are faced with the problem of estimating a deterministic parameter (vector) <math>\theta \in \Theta</math> from observations <math>x \in \mathcal{X}</math>, an [[estimator]] (estimation rule) <math>\delta^M \,\!</math> is called '''minimax''' if its maximal [[risk]] is minimal among all estimators of <math>\theta \,\!</math>. In a sense this means that <math>\delta^M \,\!</math> is an estimator which performs best in the worst possible case allowed in the problem.
In statistical [[decision theory]], where we are faced with the problem of estimating a deterministic parameter (vector) <math>\theta \in \Theta</math> from observations <math>x \in \mathcal{X}</math>. An [[estimator]] (estimation rule) <math>\delta^M \,\!</math> is called '''minimax''' if it's maximal [[risk]] is minimal among all estimators of <math>\theta \,\!</math>. In a sense this means that <math>\delta^M \,\!</math> is an estimator which performs best in the worst possible case allowed in the problem.




==Problem Setup==
==Problem Setup==
Consider the problem of estimating a deterministic (not [[Bayes estimator|Bayesian]]) parameter <math>\theta \in \Theta</math> from noisy or corrupt data <math>x \in \mathcal{X}</math> related through the conditional [[probability distribution]] <math>P(x|\theta)\,\!</math>. Our goal is to find a "good" estimatimator <math>\delta(x) \,\!</math> for estimating the parameter <math>\theta \,\!</math>, which minimizes some given [[risk function]] <math>R(\theta,\delta) \,\!</math>. Here the risk function is the [[expected value|expectation]] of some [[loss function]] <math>L(\theta,\delta) \,\!</math> with respect to <math>P(x|\theta)\,\!</math>. A popular example for a loss function is the squared error loss <math>L(\theta,\delta)= \|\theta-\delta\|^2 \,\!</math>, and the risk function for this loss is the [[mean squared error]] (MSE).
Consider the problem of estimating a deterministic (not [[Bayes estimator|Bayesian]]) parameter <math>\theta \in \Theta</math> from noisy or corrupt data <math>x \in \mathcal{X}</math> related through the conditional [[probability distribution]] <math>P(x|\theta)\,\!</math>. Our goal is to find a "good" estimator <math>\delta(x) \,\!</math> for estimating the parameter <math>\theta \,\!</math>, which minimizes some given [[risk function]] <math>R(\theta,\delta) \,\!</math>. Here the risk function is the [[expected value|expectation]] of some [[loss function]] <math>L(\theta,\delta) \,\!</math> with respect to <math>P(x|\theta)\,\!</math>. A popular example for a loss function is the squared error loss <math>L(\theta,\delta)= \|\theta-\delta\|^2 \,\!</math>, and the risk function for this loss is the [[mean squared error]] (MSE).


Unfortunatlly in general the risk can not be minimized, since it depends on the unknown parameter <math>\theta \,\!</math> itself (If we knew what was the actual value of<math>\theta \,\!</math>, we wouldnt need to estimate it). Therefore an aditional criteria for finding an optimal estimator in some sense are requiered. One such criteria is the minimax criteria.
Unfortunately in general the risk can not be minimized, since it depends on the unknown parameter <math>\theta \,\!</math> itself (If we knew what was the actual value of <math>\theta \,\!</math>, we wouldn't need to estimate it). Therefore an additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criteria.


==Definition==
==Definition==
'''Definition''' : An estimator <math>\delta^M:\mathcal{X} \rightarrow \Theta \,\!</math> is called '''minimax''' with respect to a risk function <math>R(\theta,\delta) \,\!</math> if it achievs the smallest maximum risk among all estimators, meaning it satisfies
'''Definition''' : An estimator <math>\delta^M:\mathcal{X} \rightarrow \Theta \,\!</math> is called '''minimax''' with respect to a risk function <math>R(\theta,\delta) \,\!</math> if it achieves the smallest maximum risk among all estimators, meaning it satisfies
: <Math>\sup_{\theta \in \Theta} R(\theta,\delta^M) = \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta)\,\!</math>.
: <Math>\sup_{\theta \in \Theta} R(\theta,\delta^M) = \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta)\,\!</math>.




==Least Favorable Distribution==
==Least Favorable Distribution==
Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a [[Bayes estimator]] with respect to a prior least favorable distribution of <math>\theta \,\!</math>. To demonstrate this notaion denote the avarge risk of the Bayes estimator <math>\delta_{pi} \,\!</math> with respect to a prior distribution <math>\pi \,\!</math> as
Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a [[Bayes estimator]] with respect to a prior least favorable distribution of <math>\theta \,\!</math>. To demonstrate this notion denote the average risk of the Bayes estimator <math>\delta_{\pi} \,\!</math> with respect to a prior distribution <math>\pi \,\!</math> as
: <Math>r_{\pi}=\int R(\theta,\delta_{\pi})d\pi(\theta)\,\!</math>
: <Math>r_{\pi}=\int R(\theta,\delta_{\pi})d\pi(\theta)\,\!</math>
'''Definition''' : A prior distribution <math>\pi \,\!</math> is called least favorable if for any other distribution <math>\pi ' \,\!</math> the avarge risk satisfies, <math>r_{\pi} \geq r_{\pi '} \,\!</math>.
'''Definition''' : A prior distribution <math>\pi \,\!</math> is called least favorable if for any other distribution <math>\pi ' \,\!</math> the average risk satisfies, <math>r_{\pi} \geq r_{\pi '} \,\!</math>.


'''Theorem''' : If <Math>r_{\pi}=\sup_{\theta} R(\theta,\delta_{\pi})\,\!</math>, then:
'''Theorem''' : If <Math>r_{\pi}=\sup_{\theta} R(\theta,\delta_{\pi})\,\!</math>, then:
Line 29: Line 29:
3)<Math>\pi\,\!</math> is least favorable.
3)<Math>\pi\,\!</math> is least favorable.


'''Concludion''': If an estimator has constant risk, it is minimax. Note that it is not a necessary condition.
'''Conclusion''': If an estimator has constant risk, it is minimax. Note that it is not a necessary condition.


'''Example''': Consider problem of estimating the mean of [[Normal distribution| Gaussian]] random variable, <Math>x \sim N(\theta,\sigma^2)\,\!</math>. The [[Maximum likelihood]] (ML) estimator for <Math>\theta\,\!</math> in this case is, <Math>\delta_{ML}=x\,\!</math>, and it risk is
'''Example''': Consider the problem of estimating the mean of <math>n\,\!</math> dimensional [[Normal distribution| Gaussian]] random vector, <Math>x \sim N(\theta,I_n \sigma^2)\,\!</math>. The [[Maximum likelihood]] (ML) estimator for <Math>\theta\,\!</math> in this case is simply <Math>\delta_{ML}=x\,\!</math>, and it risk is
: <Math>R(\theta,\delta_{ML})=E{(x-\theta)^2}=\sigma^2\,\!</math>.
: <Math>R(\theta,\delta_{ML})=E{\|\delta_{ML}-\theta\|^2}=\sum \limits_1^n E{(x_i-\theta_i)^2}=n \sigma^2\,\!</math>.
[[Image:MSE of ML vs JS.png|thumb|right|350px|MSE of maximum likelihood estimator vs. James-Stein estimator]]
So the risk is constant, and therfore the ML estimator is minimax.
So the risk is constant, and therefore the ML estimator is minimax. Nonetheless, minimaxity does not always imply [[Admissible decision rule|admissibility]]. In fact in this example, the ML estimator is known to be inadmissible (not admissible) whenever <math>n >2\,\!</math>. The famous [[James-Stein estimator]] dominates the ML whenever <math>n >2\,\!</math>. Though both estimators have the same risk <math>n \sigma^2\,\!</math> when <math>\|\theta\| \rightarrow \infty\,\!</math>, and they are both minimax, the James-Stein Estimator has smaller risk for any finite <math>\|\theta\|\,\!</math>. This fact is illustrated in the following figure.

The reason for that is that the ML estimator is not an actual Bayes estimator, but rather the limit of such estimators.

'''Definition''' : A sequence of prior distributions <Math> {\pi}_n\,\!</math>, is called least favorable if for any other distribution <Math>\pi '\,\!</math>,
:<Math>\lim_{n \rightarrow \infty} r_{\pi_n} \leq r_{\pi '}\,\!</math>.

'''Theorem 2''' : If <Math>\delta=\lim_{n \rightarrow \infty} \delta_{\pi_n}\,\!</math> and <Math>\sup_{\theta} R(\theta,\delta_{\pi})=\lim_{n \rightarrow \infty} r_{\pi_n} \,\!</math>, then .

1)<Math>\delta\,\!</math> is minimax.

2)The sequence <Math>{\pi}_n\,\!</math> is least favorable.

Notice that no uniqueness is guaranteed here. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a [[Uniform distribution (continuous)| uniform ]] prior, <Math>\pi_n \sim U[-n,n]\,\!</math> with increasing support and also with respect to a zero mean normal prior <Math>\pi_n \sim N(0,n\sigma^2)\,\!</math> with increasing variance. So neither the resulting ML estimator is unique minimax not the least favorable prior is unique.

==Some Examples==
In general it is difficult, often even impossible to determine the minimax estimator. Nonetheless, in many cases a minimax estimator was determined.

'''Example 1, Bounded Normal Mean''': When estimating the Mean of a Normal Vector <Math>x \sim N(\theta,I_n \sigma^2)\,\!</math>, where it is known that <Math>\|\theta\|^2 \leq M\,\!</math>. The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding [[sphere]] is known to be minimax whenever <Math>M \leq n\,\!</math>. The analytical expression for this estimator is
:<Math>\delta^M=\frac{nJ_{n+1}(n\|x\|)}{\|x\|J_{n}(n\|x\|)}\,\!</math>,
where <Math>J_{n}(t)\,\!</math>, is the modified [[Bessel function]] of the first kind of order <Math>n\,\!</math>.
'''Example 2, Unfair Coin''': Consider the problem of estimating the "success" rate of a [[Binomial distribution|Binomial]] variable, <Math>x \sim B(n,\theta)\,\!</math>. This may be viewed as estimating the rate at which an [[Fair coin| unfair coin]] falls on "heads" or "tails". In this case the minimax estimator is the Bayes estimator with respect to a [[Beta distribution|Beta]] distributed prior, <Math>\theta \sim Beta(\sqrt{n},\sqrt{n})\,\!</math>, and the analytical expression for it is
:<Math>\delta^M=\frac{x+0.5\sqrt{n}}{n+\sqrt{n}}\,\!</math>.

==References==
*E. L. Lehmann and G. Casella, Theory of point estimation, New York, NY: Springer-Verlag,
Inc., second edition, 1998.
*Perron F. Marchand, E., \On the minimax estimator of a bounded normal mean," Statistics
and Probability Letters 58, pp. 327{333, 2002.
* James O. Berger ''Statistical Decision Theory and Bayesian Analysis''. Second Edition. Springer-Verlag, 1980, 1985. ISBN 0-387-96098-8.
*C. Stein, \Estimation of the mean of a multivariate normal distribution," Ann. Stat., vol. 9,
no. 6, pp. 1135{1151, Nov. 1981.

<!-- [[Category:Decision theory]] -->

Latest revision as of 13:00, 1 May 2010


In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) from observations . An estimator (estimation rule) is called minimax if it's maximal risk is minimal among all estimators of . In a sense this means that is an estimator which performs best in the worst possible case allowed in the problem.


Problem Setup

[edit]

Consider the problem of estimating a deterministic (not Bayesian) parameter from noisy or corrupt data related through the conditional probability distribution . Our goal is to find a "good" estimator for estimating the parameter , which minimizes some given risk function . Here the risk function is the expectation of some loss function with respect to . A popular example for a loss function is the squared error loss , and the risk function for this loss is the mean squared error (MSE).

Unfortunately in general the risk can not be minimized, since it depends on the unknown parameter itself (If we knew what was the actual value of , we wouldn't need to estimate it). Therefore an additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criteria.

Definition

[edit]

Definition : An estimator is called minimax with respect to a risk function if it achieves the smallest maximum risk among all estimators, meaning it satisfies

.


Least Favorable Distribution

[edit]

Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a Bayes estimator with respect to a prior least favorable distribution of . To demonstrate this notion denote the average risk of the Bayes estimator with respect to a prior distribution as

Definition : A prior distribution is called least favorable if for any other distribution the average risk satisfies, .

Theorem : If , then:

1) is minimax.

2)If is a unique Bayes estimator, it is also the unique minimax estimator.

3) is least favorable.

Conclusion: If an estimator has constant risk, it is minimax. Note that it is not a necessary condition.

Example: Consider the problem of estimating the mean of dimensional Gaussian random vector, . The Maximum likelihood (ML) estimator for in this case is simply , and it risk is

.
MSE of maximum likelihood estimator vs. James-Stein estimator

So the risk is constant, and therefore the ML estimator is minimax. Nonetheless, minimaxity does not always imply admissibility. In fact in this example, the ML estimator is known to be inadmissible (not admissible) whenever . The famous James-Stein estimator dominates the ML whenever . Though both estimators have the same risk when , and they are both minimax, the James-Stein Estimator has smaller risk for any finite . This fact is illustrated in the following figure.

The reason for that is that the ML estimator is not an actual Bayes estimator, but rather the limit of such estimators.

Definition : A sequence of prior distributions , is called least favorable if for any other distribution ,

.

Theorem 2 : If and , then .

1) is minimax.

2)The sequence is least favorable.

Notice that no uniqueness is guaranteed here. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a uniform prior, with increasing support and also with respect to a zero mean normal prior with increasing variance. So neither the resulting ML estimator is unique minimax not the least favorable prior is unique.

Some Examples

[edit]

In general it is difficult, often even impossible to determine the minimax estimator. Nonetheless, in many cases a minimax estimator was determined.

Example 1, Bounded Normal Mean: When estimating the Mean of a Normal Vector , where it is known that . The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding sphere is known to be minimax whenever . The analytical expression for this estimator is

,

where , is the modified Bessel function of the first kind of order .

Example 2, Unfair Coin: Consider the problem of estimating the "success" rate of a Binomial variable, . This may be viewed as estimating the rate at which an unfair coin falls on "heads" or "tails". In this case the minimax estimator is the Bayes estimator with respect to a Beta distributed prior, , and the analytical expression for it is

.

References

[edit]
  • E. L. Lehmann and G. Casella, Theory of point estimation, New York, NY: Springer-Verlag,

Inc., second edition, 1998.

  • Perron F. Marchand, E., \On the minimax estimator of a bounded normal mean," Statistics

and Probability Letters 58, pp. 327{333, 2002.

  • James O. Berger Statistical Decision Theory and Bayesian Analysis. Second Edition. Springer-Verlag, 1980, 1985. ISBN 0-387-96098-8.
  • C. Stein, \Estimation of the mean of a multivariate normal distribution," Ann. Stat., vol. 9,

no. 6, pp. 1135{1151, Nov. 1981.