User:Mygskr/SURE: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 13:19, 2 August 2023

In statistics, Stein's unbiased risk estimate (SURE) is an unbiased estimator of the mean-squared error of "a nearly arbitrary, nonlinear biased estimator."^[1] In other words, it provides an indication of the accuracy of a given estimator. This is important since the true mean-squared error of an estimator is a function of the unknown parameter to be estimated, and thus cannot be determined exactly.

The technique is named after its discoverer, Charles Stein.^[2]

Formal statement

Let $\mu \in {\mathbb {R} }^{d}$ be an unknown parameter and let $x\in {\mathbb {R} }^{d}$ be a measurement vector whose components are independent and distributed normally with mean $\mu$ and variance $\sigma ^{2}$ . Suppose $h(x)$ is an estimator of $\mu$ from $x$ , and can be written $h(x)=x+g(x)$ , where $g$ is weakly differentiable. Then, Stein's unbiased risk estimate is given by^[3]

\mathrm {SURE} (h)=d\sigma ^{2}+\|g(x)\|^{2}+2\sigma ^{2}\sum _{i=1}^{d}{\frac {\partial }{\partial x_{i}}}g_{i}(x),

where $g_{i}(x)$ is the $i$ th component of the function $g(x)$ , and $\|\cdot \|$ is the Euclidean norm.

The importance of SURE is that it is an unbiased estimate of the mean-squared error (or squared error risk) of $h(x)$ , i.e.

E_{\mu }\{\mathrm {SURE} (h)\}=\mathrm {MSE} (h),\,\!

with

\mathrm {MSE} (h)=E_{\mu }\|h(x)-\mu \|^{2}.

Thus, minimizing SURE can act as a surrogate for minimizing the MSE. Note that there is no dependence on the unknown parameter $\mu$ in the expression for SURE above. Thus, it can be manipulated (e.g., to determine optimal estimation settings) without knowledge of $\mu$ .

Proof

We wish to show that

E_{\mu }\|h(x)-\mu \|^{2}=E_{\mu }\{\mathrm {SURE} (h)\}

.

We start by expanding the MSE as

{\begin{aligned}E_{\mu }\|h(x)-\mu \|^{2}&=E_{\mu }\|g(x)+x-\mu \|^{2}\\&=E_{\mu }\|g(x)\|^{2}+E_{\mu }\|x-\mu \|^{2}-2E_{\mu }g(x)^{T}(x-\mu )\\&=\sum _{i=1}^{d}E_{\mu }\|g(x)\|^{2}+d\sigma ^{2}-2E_{\mu }g(x)^{T}(x-\mu ).\end{aligned}}

Now we use integration by parts to rewrite the last term:

{\begin{aligned}E_{\mu }g(x)^{T}(x-\mu )&=\int _{R^{d}}{\frac {1}{\sqrt {2\pi \sigma ^{2d}}}}\exp \left(-{\frac {\|x-\mu \|^{2}}{2\sigma ^{2}}}\right)\sum _{i=1}^{d}g_{i}(x)(x_{i}-\mu _{i})d^{d}x\\&=-\sigma ^{2}\sum _{i=1}^{d}\int _{R^{d}}{\frac {1}{\sqrt {2\pi \sigma ^{2d}}}}\exp \left(-{\frac {\|x-\mu \|^{2}}{2\sigma ^{2}}}\right){\frac {dg_{i}}{dx_{i}}}d^{d}x\\&=-\sigma ^{2}\sum _{i=1}^{d}E_{\mu }{\frac {dg_{i}}{dx_{i}}}.\end{aligned}}

Substituting this into the expression for the MSE, we arrive at

E_{\mu }\|h(x)-\mu \|^{2}=E_{\mu }\left(d\sigma ^{2}+\|g(x)\|^{2}+2\sigma ^{2}\sum _{i=1}^{d}{\frac {dg_{i}}{dx_{i}}}\right).

Applications

A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the James–Stein estimator can be derived by finding the optimal shrinkage estimator.^[2] The technique has also been used by Donoho and Johnstone to determine the optimal shrinkage factor in a wavelet denoising setting.^[1]

References

^ ^a ^b Donoho, David L.; Johnstone, Iain M. (1995). "Adapting to Unknown Smoothness via Wavelet Shrinkage". Journal of the American Statistical Association. 90 (432). Journal of the American Statistical Association, Vol. 90, No. 432: 1200–1244. doi:10.2307/2291512. JSTOR 2291512. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: date and year (link)
^ ^a ^b Stein, Charles M. (November 1981). "Estimation of the Mean of a Multivariate Normal Distribution". The Annals of Statistics. 9 (6): 1135–1151. doi:10.1214/aos/1176345632. JSTOR 2240405.{{cite journal}}: CS1 maint: date and year (link)
^ Wasserman, Larry (2005). All of Nonparametric Statistics.

Category:Error Category:Estimation theory Category:Risk

[donoho95-1] Donoho, David L.; Johnstone, Iain M. (1995). "Adapting to Unknown Smoothness via Wavelet Shrinkage". Journal of the American Statistical Association. 90 (432). Journal of the American Statistical Association, Vol. 90, No. 432: 1200–1244. doi:10.2307/2291512. JSTOR 2291512. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: date and year (link)

[stein81-2] Stein, Charles M. (November 1981). "Estimation of the Mean of a Multivariate Normal Distribution". The Annals of Statistics. 9 (6): 1135–1151. doi:10.1214/aos/1176345632. JSTOR 2240405.{{cite journal}}: CS1 maint: date and year (link)

[wasserman05-3] Wasserman, Larry (2005). All of Nonparametric Statistics.

[1]

[2]

[3]

@@ Line 1: / Line 1: @@
 In [[statistics]], '''Stein's unbiased risk estimate (SURE)''' is an [[bias of an estimator|unbiased]] [[estimator]] of the [[mean-squared error]] of "a nearly arbitrary, nonlinear biased estimator."<ref name="donoho95"/>  In other words, it provides an indication of the accuracy of a given estimator. This is important since the true mean-squared error of an estimator is a function of the unknown parameter to be estimated, and thus cannot be determined exactly.
-The technique is named after its discoverer, [[Charles Stein (statistician)|Charles Stein]].<ref name='stein81'> {{cite journal|title=Estimation of the Mean of a Multivariate Normal Distribution|journal=The Annals of Statistics| first=Charles M.|last=Stein|coauthors=|volume=9|issue=6|pages=1135–1151|id= |month=November|year=1981|doi=10.1214/aos/1176345632|jstor=2240405}}</ref>
+The technique is named after its discoverer, [[Charles Stein (statistician)|Charles Stein]].<ref name='stein81'> {{cite journal|title=Estimation of the Mean of a Multivariate Normal Distribution|journal=The Annals of Statistics| first=Charles M.|last=Stein|volume=9|issue=6|pages=1135–1151|id= |date=November 1981|year=1981|doi=10.1214/aos/1176345632|jstor=2240405}}</ref>
 == Formal statement ==
-Let <math>\mu \in {\mathbb R}^d</math> be an unknown parameter and let <math>x \in {\mathbb R}^d</math> be a measurement vector whose components are independent and distributed normally with mean <math>\mu</math> and variance 1. Suppose <math>h(x)</math> is an estimator of <math>\mu</math> from <math>x</math>, and can be written <math>h(x) = x + g(x)</math>, where <math>g</math> is [[Weak derivative|weakly differentiable]]. Then, Stein's unbiased risk estimate is given by <ref name='wasserman05'>{{cite book|title=All of Nonparametric Statistics| first=Larry|last=Wasserman|year=2005}}</ref>
+Let <math>\mu \in {\mathbb R}^d</math> be an unknown parameter and let <math>x \in {\mathbb R}^d</math> be a measurement vector whose components are independent and distributed normally with mean <math>\mu</math> and variance <math>\sigma^2</math>. Suppose <math>h(x)</math> is an estimator of <math>\mu</math> from <math>x</math>, and can be written <math>h(x) = x + g(x)</math>, where <math>g</math> is [[Weak derivative|weakly differentiable]]. Then, Stein's unbiased risk estimate is given by<ref name='wasserman05'>{{cite book|title=All of Nonparametric Statistics| first=Larry|last=Wasserman|year=2005}}</ref>
-:<math>\mathrm{SURE}(h) = d \sigma^2 + \|g(x)\|^2 + 2 \sigma^2 \sum_{i=1}^d \frac{\partial}{\partial x_i} g_i(x), </math>
+:<math>\mathrm{SURE}(h) = d\sigma^2 + \|g(x)\|^2 + 2 \sigma^2 \sum_{i=1}^d \frac{\partial}{\partial x_i} g_i(x), </math>
 where <math>g_i(x)</math> is the <math>i</math>th component of the function <math>g(x)</math>, and <math>\|\cdot\|</math> is the [[Euclidean norm]].
 The importance of SURE is that it is an unbiased estimate of the mean-squared error (or squared error risk) of <math>h(x)</math>, i.e.
-:<math>E_\mu \{ \mathrm{SURE}(h) \} = E_\mu \|h(x)-\mu\|^2.</math>
+:<math>E_\mu \{ \mathrm{SURE}(h) \} = \mathrm{MSE}(h),\,\! </math>
+with
+:<math>\mathrm{MSE}(h) = E_\mu \|h(x)-\mu\|^2.</math>
 Thus, minimizing SURE can act as a surrogate for minimizing the MSE.  Note that there is no dependence on the unknown parameter <math>\mu</math> in the expression for SURE above. Thus, it can be manipulated (e.g., to determine optimal estimation settings) without knowledge of <math>\mu</math>.
-=== Derivation ===
+== Proof ==
+We wish to show that
+: <math>E_\mu \|h(x)-\mu\|^2 = E_\mu \{ \mathrm{SURE}(h) \} </math>.
+We start by expanding the MSE as
+: <math>\begin{align} E_\mu \| h(x) - \mu\|^2 & = E_\mu \|g(x) + x - \mu\|^2 \\
+                                                                           & = E_\mu \|g(x)\|^2 + E_\mu \|x - \mu\|^2 - 2 E_\mu g(x)^T (x - \mu) \\
+                                                                           & = \sum_{i=1}^d E_\mu \|g(x)\|^2 + d \sigma^2 - 2 E_\mu g(x)^T(x - \mu).
+\end{align}
+</math>
+Now we use [[integration by parts#Higher_dimensions | integration by parts]] to rewrite the last term:
+:<math>
+\begin{align}
+E_\mu g(x)^T(x - \mu) & = \int_{R^d} \frac{1}{\sqrt{2 \pi \sigma^{2d}}} \exp\left(-\frac{\|x - \mu\|^2}{2 \sigma^2} \right) \sum_{i=1}^d g_i(x) (x_i - \mu_i) d^d x \\
+& = -\sigma^2 \sum_{i=1}^d\int_{R^d} \frac{1}{\sqrt{2 \pi \sigma^{2d}}} \exp\left(-\frac{\|x - \mu\|^2}{2 \sigma^2} \right)  \frac{dg_i}{dx_i} d^d x \\
+& = -\sigma^2 \sum_{i=1}^d E_\mu \frac{dg_i}{dx_i}.
+\end{align}
+</math>
+Substituting this into the expression for the MSE, we arrive at
+: <math>E_\mu \|h(x) - \mu\|^2 = E_\mu \left( d\sigma^2 + \|g(x)\|^2 + 2\sigma^2 \sum_{i=1}^d \frac{dg_i}{dx_i}\right).</math>
 == Applications ==
-A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the [[James–Stein estimator]] can be derived by finding the optimal shrinkage estimator.<ref name="stein81"/> The technique has also been used by [[David Donoho|Donoho]] and Johnstone to determine the optimal shrinkage factor in a [[wavelet]] [[denoising]] setting.<ref name='donoho95'> {{cite journal|title=Adapting to Unknown Smoothness via Wavelet Shrinkage|journal=Journal of the American Statistical Association| first=David L.|last=Donoho|authorlink=David Donoho|coauthors=Iain M. Johnstone|volume=90|issue=432|pages=1200–1244|id= |month=December|year=1995|doi=10.2307/2291512|publisher=Journal of the American Statistical Association, Vol. 90, No. 432|jstor=2291512}}</ref>
+A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the [[James–Stein estimator]] can be derived by finding the optimal [[shrinkage estimator]].<ref name="stein81"/> The technique has also been used by [[David Donoho|Donoho]] and Johnstone to determine the optimal shrinkage factor in a [[wavelet]] [[denoising]] setting.<ref name='donoho95'> {{cite journal|title=Adapting to Unknown Smoothness via Wavelet Shrinkage|journal=Journal of the American Statistical Association|authorlink=David Donoho|volume=90|issue=432|pages=1200–1244|id= |month=December|year=1995|doi=10.2307/2291512|publisher=Journal of the American Statistical Association, Vol. 90, No. 432|jstor=2291512 |last1=Donoho |first1=David L. |last2=Johnstone |first2=Iain M. |date=1995 }}</ref>
 == References ==
 {{reflist}}
-[[Category:Error]]
+[[:Category:Error]]
-[[Category:Estimation theory]]
+[[:Category:Estimation theory]]
-[[Category:Risk]]
+[[:Category:Risk]]