Stein's unbiased risk estimate: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 00:22, 15 December 2020

In statistics, Stein's unbiased risk estimate (SURE) is an unbiased estimator of the mean-squared error of "a nearly arbitrary, nonlinear biased estimator."^[1] In other words, it provides an indication of the accuracy of a given estimator. This is important since the true mean-squared error of an estimator is a function of the unknown parameter to be estimated, and thus cannot be determined exactly.

The technique is named after its discoverer, Charles Stein.^[2]

Formal statement

Let $\mu \in {\mathbb {R} }^{d}$ be an unknown parameter and let $x\in {\mathbb {R} }^{d}$ be a measurement vector whose components are independent and distributed normally with mean $\mu _{i},i=1,...,d,$ and variance $\sigma ^{2}$ . Suppose $h(x)$ is an estimator of $\mu$ from $x$ , and can be written $h(x)=x+g(x)$ , where $g$ is weakly differentiable. Then, Stein's unbiased risk estimate is given by^[3]

\operatorname {SURE} (h)=d\sigma ^{2}+\|g(x)\|^{2}+2\sigma ^{2}\sum _{i=1}^{d}{\frac {\partial }{\partial x_{i}}}g_{i}(x)=-d\sigma ^{2}+\|g(x)\|^{2}+2\sigma ^{2}\sum _{i=1}^{d}{\frac {\partial }{\partial x_{i}}}h_{i}(x),

where $g_{i}(x)$ is the $i$ th component of the function $g(x)$ , and $\|\cdot \|$ is the Euclidean norm.

The importance of SURE is that it is an unbiased estimate of the mean-squared error (or squared error risk) of $h(x)$ , i.e.

\operatorname {E} _{\mu }\{\operatorname {SURE} (h)\}=\operatorname {MSE} (h),\,\!

with

\operatorname {MSE} (h)=\operatorname {E} _{\mu }\|h(x)-\mu \|^{2}.

Thus, minimizing SURE can act as a surrogate for minimizing the MSE. Note that there is no dependence on the unknown parameter $\mu$ in the expression for SURE above. Thus, it can be manipulated (e.g., to determine optimal estimation settings) without knowledge of $\mu$ .

Proof

We wish to show that

\operatorname {E} _{\mu }\|h(x)-\mu \|^{2}=\operatorname {E} _{\mu }\{\operatorname {SURE} (h)\}.

We start by expanding the MSE as

{\begin{aligned}\operatorname {E} _{\mu }\|h(x)-\mu \|^{2}&=\operatorname {E} _{\mu }\|g(x)+x-\mu \|^{2}\\&=\operatorname {E} _{\mu }\|g(x)\|^{2}+\operatorname {E} _{\mu }\|x-\mu \|^{2}+2\operatorname {E} _{\mu }g(x)^{T}(x-\mu )\\&=\operatorname {E} _{\mu }\|g(x)\|^{2}+d\sigma ^{2}+2\operatorname {E} _{\mu }g(x)^{T}(x-\mu ).\end{aligned}}

Now we use integration by parts to rewrite the last term:

{\begin{aligned}\operatorname {E} _{\mu }g(x)^{T}(x-\mu )&=\int _{{\mathbb {R} }^{d}}{\frac {1}{\sqrt {2\pi \sigma ^{2d}}}}\exp \left(-{\frac {\|x-\mu \|^{2}}{2\sigma ^{2}}}\right)\sum _{i=1}^{d}g_{i}(x)(x_{i}-\mu _{i})d^{d}x\\&=\sigma ^{2}\sum _{i=1}^{d}\int _{{\mathbb {R} }^{d}}{\frac {1}{\sqrt {2\pi \sigma ^{2d}}}}\exp \left(-{\frac {\|x-\mu \|^{2}}{2\sigma ^{2}}}\right){\frac {dg_{i}}{dx_{i}}}d^{d}x\\&=\sigma ^{2}\sum _{i=1}^{d}\operatorname {E} _{\mu }{\frac {dg_{i}}{dx_{i}}}.\end{aligned}}

Substituting this into the expression for the MSE, we arrive at

\operatorname {E} _{\mu }\|h(x)-\mu \|^{2}=\operatorname {E} _{\mu }\left(d\sigma ^{2}+\|g(x)\|^{2}+2\sigma ^{2}\sum _{i=1}^{d}{\frac {dg_{i}}{dx_{i}}}\right).

Applications

A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the James–Stein estimator can be derived by finding the optimal shrinkage estimator.^[2] The technique has also been used by Donoho and Johnstone to determine the optimal shrinkage factor in a wavelet denoising setting.^[1]

References

^ ^a ^b Donoho, David L.; Iain M. Johnstone (December 1995). "Adapting to Unknown Smoothness via Wavelet Shrinkage". Journal of the American Statistical Association. 90 (432): 1200–1244. CiteSeerX 10.1.1.161.8697. doi:10.2307/2291512. JSTOR 2291512.
^ ^a ^b Stein, Charles M. (November 1981). "Estimation of the Mean of a Multivariate Normal Distribution". The Annals of Statistics. 9 (6): 1135–1151. doi:10.1214/aos/1176345632. JSTOR 2240405.
^ Wasserman, Larry (2005). All of Nonparametric Statistics.

[donoho95-1] Donoho, David L.; Iain M. Johnstone (December 1995). "Adapting to Unknown Smoothness via Wavelet Shrinkage". Journal of the American Statistical Association. 90 (432): 1200–1244. CiteSeerX 10.1.1.161.8697. doi:10.2307/2291512. JSTOR 2291512.

[stein81-2] Stein, Charles M. (November 1981). "Estimation of the Mean of a Multivariate Normal Distribution". The Annals of Statistics. 9 (6): 1135–1151. doi:10.1214/aos/1176345632. JSTOR 2240405.

[wasserman05-3] Wasserman, Larry (2005). All of Nonparametric Statistics.

[1]

[2]

[3]

@@ Line 1: / Line 1: @@
 In [[statistics]], '''Stein's unbiased risk estimate (SURE)''' is an [[bias of an estimator|unbiased]] [[estimator]] of the [[mean-squared error]] of "a nearly arbitrary, nonlinear biased estimator."<ref name="donoho95"/>  In other words, it provides an indication of the accuracy of a given estimator. This is important since the true mean-squared error of an estimator is a function of the unknown parameter to be estimated, and thus cannot be determined exactly.
-The technique is named after its discoverer, [[Charles Stein (statistician)|Charles Stein]].<ref name='stein81'> {{cite journal|title=Estimation of the Mean of a Multivariate Normal Distribution|journal=The Annals of Statistics| first=Charles M.|last=Stein|coauthors=|volume=9|issue=6|pages=1135–1151|id= |month=November|year=1981|doi=10.1214/aos/1176345632|jstor=2240405}}</ref>
+The technique is named after its discoverer, [[Charles Stein (statistician)|Charles Stein]].<ref name='stein81'> {{cite journal|title=Estimation of the Mean of a Multivariate Normal Distribution|journal=The Annals of Statistics| first=Charles M.|last=Stein|volume=9|issue=6|pages=1135–1151|date=November 1981|doi=10.1214/aos/1176345632|jstor=2240405|doi-access=free}}</ref>
 == Formal statement ==
-Let <math>\mu \in {\mathbb R}^d</math> be an unknown parameter and let <math>x \in {\mathbb R}^d</math> be a measurement vector whose components are independent and distributed normally with mean <math>\mu</math> and variance <math>\sigma^2</math>. Suppose <math>h(x)</math> is an estimator of <math>\mu</math> from <math>x</math>, and can be written <math>h(x) = x + g(x)</math>, where <math>g</math> is [[Weak derivative|weakly differentiable]]. Then, Stein's unbiased risk estimate is given by<ref name='wasserman05'>{{cite book|title=All of Nonparametric Statistics| first=Larry|last=Wasserman|year=2005}}</ref>
+Let <math>\mu \in {\mathbb R}^d</math> be an unknown parameter and let <math>x \in {\mathbb R}^d</math> be a measurement vector whose components are independent and distributed normally with mean <math>\mu_i, i=1,...,d,</math> and variance <math>\sigma^2</math>. Suppose <math>h(x)</math> is an estimator of <math>\mu</math> from <math>x</math>, and can be written <math>h(x) = x + g(x)</math>, where <math>g</math> is [[Weak derivative|weakly differentiable]]. Then, Stein's unbiased risk estimate is given by<ref name='wasserman05'>{{cite book|title=All of Nonparametric Statistics| first=Larry|last=Wasserman|year=2005}}</ref>
-:<math>\mathrm{SURE}(h) = d\sigma^2 + \|g(x)\|^2 + 2 \sigma^2 \sum_{i=1}^d \frac{\partial}{\partial x_i} g_i(x), </math>
+:<math>\operatorname{SURE}(h) = d\sigma^2 + \|g(x)\|^2 + 2 \sigma^2 \sum_{i=1}^d \frac{\partial}{\partial x_i} g_i(x) = -d\sigma^2 + \|g(x)\|^2 + 2 \sigma^2 \sum_{i=1}^d \frac{\partial}{\partial x_i} h_i(x), </math>
 where <math>g_i(x)</math> is the <math>i</math>th component of the function <math>g(x)</math>, and <math>\|\cdot\|</math> is the [[Euclidean norm]].
 The importance of SURE is that it is an unbiased estimate of the mean-squared error (or squared error risk) of <math>h(x)</math>, i.e.
-:<math>E_\mu \{ \mathrm{SURE}(h) \} = \mathrm{MSE}(h),\,\! </math>
+:<math>\operatorname E_\mu \{ \operatorname{SURE}(h) \} = \operatorname{MSE}(h),\,\! </math>
 with
-:<math>\mathrm{MSE}(h) = E_\mu \|h(x)-\mu\|^2.</math>
+:<math>\operatorname{MSE}(h) = \operatorname E_\mu \|h(x)-\mu\|^2.</math>
 Thus, minimizing SURE can act as a surrogate for minimizing the MSE.  Note that there is no dependence on the unknown parameter <math>\mu</math> in the expression for SURE above. Thus, it can be manipulated (e.g., to determine optimal estimation settings) without knowledge of <math>\mu</math>.
@@ Line 17: / Line 17: @@
 == Proof ==
 We wish to show that
-: <math>E_\mu \|h(x)-\mu\|^2 = E_\mu \{ \mathrm{SURE}(h) \} </math>.
+: <math>\operatorname E_\mu \|h(x)-\mu\|^2 = \operatorname E_\mu \{ \operatorname{SURE}(h) \}. </math>
 We start by expanding the MSE as
-: <math>\begin{align} E_\mu \| h(x) - \mu\|^2 & = E_\mu \|g(x) + x - \mu\|^2 \\
+: <math>\begin{align} \operatorname E_\mu \| h(x) - \mu\|^2 & = \operatorname E_\mu \|g(x) + x - \mu\|^2 \\
-                                                                           & = E_\mu \|g(x)\|^2 + E_\mu \|x - \mu\|^2 - 2 E_\mu g(x)^T (x - \mu) \\
+                                                                           & = \operatorname E_\mu \|g(x)\|^2 + \operatorname E_\mu \|x - \mu\|^2 + 2 \operatorname E_\mu g(x)^T (x - \mu) \\
-                                                                           & = \sum_{i=1}^d E_\mu \|g(x)\|^2 + d \sigma^2 - 2 E_\mu g(x)^T(x - \mu).
+                                                                           & = \operatorname E_\mu \|g(x)\|^2 + d \sigma^2 + 2 \operatorname E_\mu g(x)^T(x - \mu).
 \end{align}
 </math>
@@ Line 27: / Line 27: @@
 :<math>
 \begin{align}
-E_\mu g(x)^T(x - \mu) & = \int_{R^d} \frac{1}{\sqrt{2 \pi \sigma^{2d}}} \exp\left(-\frac{\|x - \mu\|^2}{2 \sigma^2} \right) \sum_{i=1}^d g_i(x) (x_i - \mu_i) d^d x \\
+\operatorname E_\mu g(x)^T(x - \mu) & = \int_{{\mathbb R}^d} \frac{1}{\sqrt{2 \pi \sigma^{2d}}} \exp\left(-\frac{\|x - \mu\|^2}{2 \sigma^2} \right) \sum_{i=1}^d g_i(x) (x_i - \mu_i) d^d x \\
-& = -\sigma^2 \sum_{i=1}^d\int_{R^d} \frac{1}{\sqrt{2 \pi \sigma^{2d}}} \exp\left(-\frac{\|x - \mu\|^2}{2 \sigma^2} \right)  \frac{dg_i}{dx_i} d^d x \\
+& = \sigma^2 \sum_{i=1}^d\int_{{\mathbb R}^d} \frac{1}{\sqrt{2 \pi \sigma^{2d}}} \exp\left(-\frac{\|x - \mu\|^2}{2 \sigma^2} \right)  \frac{dg_i}{dx_i} d^d x \\
-& = -\sigma^2 \sum_{i=1}^d E_\mu \frac{dg_i}{dx_i}.
+& = \sigma^2 \sum_{i=1}^d \operatorname E_\mu \frac{dg_i}{dx_i}.
 \end{align}
 </math>
 Substituting this into the expression for the MSE, we arrive at
-: <math>E_\mu \|h(x) - \mu\|^2 = E_\mu \left( d\sigma^2 + \|g(x)\|^2 + 2\sigma^2 \sum_{i=1}^d \frac{dg_i}{dx_i}\right).</math>
+: <math>\operatorname E_\mu \|h(x) - \mu\|^2 = \operatorname E_\mu \left( d\sigma^2 + \|g(x)\|^2 + 2\sigma^2 \sum_{i=1}^d \frac{dg_i}{dx_i}\right).</math>
 == Applications ==
-A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the [[James–Stein estimator]] can be derived by finding the optimal [[shrinkage estimator]].<ref name="stein81"/> The technique has also been used by [[David Donoho|Donoho]] and Johnstone to determine the optimal shrinkage factor in a [[wavelet]] [[denoising]] setting.<ref name='donoho95'> {{cite journal|title=Adapting to Unknown Smoothness via Wavelet Shrinkage|journal=Journal of the American Statistical Association| first=David L.|last=Donoho|authorlink=David Donoho|coauthors=Iain M. Johnstone|volume=90|issue=432|pages=1200–1244|id= |month=December|year=1995|doi=10.2307/2291512|publisher=Journal of the American Statistical Association, Vol. 90, No. 432|jstor=2291512}}</ref>
+A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the [[James–Stein estimator]] can be derived by finding the optimal [[shrinkage estimator]].<ref name="stein81"/> The technique has also been used by [[David Donoho|Donoho]] and Johnstone to determine the optimal shrinkage factor in a [[wavelet]] [[denoising]] setting.<ref name='donoho95'> {{cite journal|title=Adapting to Unknown Smoothness via Wavelet Shrinkage|journal=Journal of the American Statistical Association| first=David L.|last=Donoho|author-link=David Donoho|author2=Iain M. Johnstone |volume=90|issue=432|pages=1200–1244|date=December 1995|doi=10.2307/2291512|jstor=2291512|citeseerx=10.1.1.161.8697}}</ref>
 == References ==
 {{reflist}}
-[[Category:Error]]
+[[Category:Point estimation performance]]
-[[Category:Estimation theory]]
-[[Category:Risk]]