MINQUE: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 14:37, 4 October 2024

In statistics, the theory of minimum norm quadratic unbiased estimation (MINQUE)^[1]^[2]^[3] was developed by C. R. Rao. MINQUE is a theory alongside other estimation methods in estimation theory, such as the method of moments or maximum likelihood estimation. Similar to the theory of best linear unbiased estimation, MINQUE is specifically concerned with linear regression models.^[1] The method was originally conceived to estimate heteroscedastic error variance in multiple linear regression.^[1] MINQUE estimators also provide an alternative to maximum likelihood estimators or restricted maximum likelihood estimators for variance components in mixed effects models.^[3] MINQUE estimators are quadratic forms of the response variable and are used to estimate a linear function of the variances.

Principles

We are concerned with a mixed effects model for the random vector $\mathbf {Y} \in \mathbb {R} ^{n}$ with the following linear structure.

$\mathbf {Y} =\mathbf {X} {\boldsymbol {\beta }}+\mathbf {U} _{1}{\boldsymbol {\xi }}_{1}+\cdots +\mathbf {U} _{k}{\boldsymbol {\xi }}_{k}$

Here, $\mathbf {X} \in \mathbb {R} ^{n\times m}$ is a design matrix for the fixed effects, ${\boldsymbol {\beta }}\in \mathbb {R} ^{m}$ represents the unknown fixed-effect parameters, $\mathbf {U} _{i}\in \mathbb {R} ^{n\times c_{i}}$ is a design matrix for the $i$ -th random-effect component, and ${\boldsymbol {\xi }}_{i}\in \mathbb {R} ^{c_{i}}$ is a random vector for the $i$ -th random-effect component. The random effects are assumed to have zero mean ( $\mathbb {E} [{\boldsymbol {\xi }}_{i}]=\mathbf {0}$ ) and be uncorrelated ( $\mathbb {V} [{\boldsymbol {\xi }}_{i}]=\sigma _{i}^{2}\mathbf {I} _{c_{i}}$ ). Furthermore, any two random effect vectors are also uncorrelated ( $\mathbb {V} [{\boldsymbol {\xi }}_{i},{\boldsymbol {\xi }}_{j}]=\mathbf {0} \,\forall i\neq j$ ). The unknown variances $\sigma _{1}^{2},\cdots ,\sigma _{k}^{2}$ represent the variance components of the model.

This is a general model that captures commonly used linear regression models.

Gauss-Markov Model^[3]: If we consider a one-component model where $\mathbf {U} _{1}=\mathbf {I} _{n}$ , then the model is equivalent to the Gauss-Markov model $\mathbf {Y} =\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\epsilon }}$ with $\mathbb {E} [{\boldsymbol {\epsilon }}]=\mathbf {0}$ and $\mathbb {V} [{\boldsymbol {\epsilon }}]=\sigma _{1}^{2}\mathbf {I} _{n}$ .
Heteroscedastic Model^[1]: Each set of random variables in $\mathbf {Y}$ that shares a common variance can be modeled as an individual variance component with an appropriate $\mathbf {U} _{i}$ .

A compact representation for the model is the following, where $\mathbf {U} =\left[{\begin{array}{c|c|c}\mathbf {U} _{1}&\cdots &\mathbf {U} _{k}\end{array}}\right]$ and ${\boldsymbol {\xi }}^{\top }=\left[{\begin{array}{c|c|c}{\boldsymbol {\xi }}_{1}^{\top }&\cdots &{\boldsymbol {\xi }}_{k}^{\top }\end{array}}\right]$ .

$\mathbf {Y} =\mathbf {X} {\boldsymbol {\beta }}+\mathbf {U} {\boldsymbol {\xi }}$

Note that this model makes no distributional assumptions about $\mathbf {Y}$ other than the first and second moments.^[3]

$\mathbb {E} [\mathbf {Y} ]=\mathbf {X} {\boldsymbol {\beta }}$

$\mathbb {V} [\mathbf {Y} ]=\sigma _{1}^{2}\mathbf {U} _{1}\mathbf {U} _{1}^{\top }+\cdots +\sigma _{k}^{2}\mathbf {U} _{k}\mathbf {U} _{k}^{\top }\equiv \sigma _{1}^{2}\mathbf {V} _{1}+\cdots +\sigma _{k}^{2}\mathbf {V} _{k}$

The goal in MINQUE is to estimate $\theta =\sum _{i=1}^{k}p_{i}\sigma _{i}^{2}$ using a quadratic form ${\hat {\theta }}=\mathbf {Y} ^{\top }\mathbf {A} \mathbf {Y}$ . MINQUE estimators are derived by identifying a matrix $\mathbf {A}$ such that the estimator has some desirable properties,^[2]^[3] described below.

Optimal Estimator Properties to Constrain MINQUE

Invariance to translation of the fixed effects

Consider a new fixed-effect parameter ${\boldsymbol {\gamma }}={\boldsymbol {\beta }}-{\boldsymbol {\beta }}_{0}$ , which represents a translation of the original fixed effect. The new, equivalent model is now the following.

$\mathbf {Y} -\mathbf {X} {\boldsymbol {\beta }}_{0}=\mathbf {X} {\boldsymbol {\gamma }}+\mathbf {U} {\boldsymbol {\xi }}$

Under this equivalent model, the MINQUE estimator is now $(\mathbf {Y} -\mathbf {X} {\boldsymbol {\beta }}_{0})^{\top }\mathbf {A} (\mathbf {Y} -\mathbf {X} {\boldsymbol {\beta }}_{0})$ . Rao argued that since the underlying models are equivalent, this estimator should be equal to $\mathbf {Y} ^{\top }\mathbf {A} \mathbf {Y}$ .^[2]^[3] This can be achieved by constraining $\mathbf {A}$ such that $\mathbf {A} \mathbf {X} =\mathbf {0}$ , which ensures that all terms other than $\mathbf {Y} ^{\top }\mathbf {A} \mathbf {Y}$ in the expansion of the quadratic form are zero.

Unbiased estimation

Suppose that we constrain $\mathbf {A} \mathbf {X} =\mathbf {0}$ , as argued in the section above. Then, the MINQUE estimator has the following form

${\begin{aligned}{\hat {\theta }}&=\mathbf {Y} ^{\top }\mathbf {A} \mathbf {Y} \\&=(\mathbf {X} {\boldsymbol {\beta }}+\mathbf {U} {\boldsymbol {\xi }})^{\top }\mathbf {A} (\mathbf {X} {\boldsymbol {\beta }}+\mathbf {U} {\boldsymbol {\xi }})\\&={\boldsymbol {\xi }}^{\top }\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} {\boldsymbol {\xi }}\end{aligned}}$

To ensure that this estimator is unbiased, the expectation of the estimator $\mathbb {E} [{\hat {\theta }}]$ must equal the parameter of interest, $\theta$ . Below, the expectation of the estimator can be decomposed for each component since the components are uncorrelated with each other. Furthermore, the cyclic property of the trace is used to evaluate the expectation with respect to ${\boldsymbol {\xi }}_{i}$ .

${\begin{aligned}\mathbb {E} [{\hat {\theta }}]&=\mathbb {E} [{\boldsymbol {\xi }}^{\top }\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} {\boldsymbol {\xi }}]\\&=\sum _{i=1}^{k}\mathbb {E} [{\boldsymbol {\xi }}_{i}^{\top }\mathbf {U} _{i}^{\top }\mathbf {A} \mathbf {U} _{i}{\boldsymbol {\xi }}_{i}]\\&=\sum _{i=1}^{k}\sigma _{i}^{2}\mathrm {Tr} [\mathbf {U} _{i}^{\top }\mathbf {A} \mathbf {U} _{i}]\end{aligned}}$

To ensure that this estimator is unbiased, Rao suggested setting $\sum _{i=1}^{k}\sigma _{i}^{2}\mathrm {Tr} [\mathbf {U} _{i}^{\top }\mathbf {A} \mathbf {U} _{i}]=\sum _{i=1}^{k}p_{i}\sigma _{i}^{2}$ , which can be accomplished by constraining $\mathbf {A}$ such that $\mathrm {Tr} [\mathbf {U} _{i}^{\top }\mathbf {A} \mathbf {U} _{i}]=\mathrm {Tr} [\mathbf {A} \mathbf {V} _{i}]=p_{i}$ for all components.^[3]

Minimum Norm

Rao argues that if ${\boldsymbol {\xi }}$ were observed, a "natural" estimator for $\theta$ would be the following^[2]^[3] since $\mathbb {E} [{\boldsymbol {\xi }}_{i}^{\top }{\boldsymbol {\xi }}_{i}]=c_{i}\sigma _{i}^{2}$ . Here, ${\boldsymbol {\Delta }}$ is defined as a diagonal matrix.

${\frac {p_{1}}{c_{1}}}{\boldsymbol {\xi }}_{1}^{\top }{\boldsymbol {\xi }}_{1}+\cdots +{\frac {p_{k}}{c_{k}}}{\boldsymbol {\xi }}_{k}^{\top }{\boldsymbol {\xi }}_{k}={\boldsymbol {\xi }}^{\top }\left[\mathrm {diag} \left({\frac {p_{1}}{c_{i}}},\cdots ,{\frac {p_{k}}{c_{k}}}\right)\right]{\boldsymbol {\xi }}\equiv {\boldsymbol {\xi }}^{\top }{\boldsymbol {\Delta }}{\boldsymbol {\xi }}$

The difference between the proposed estimator and the natural estimator is ${\boldsymbol {\xi }}^{\top }(\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }}){\boldsymbol {\xi }}$ . This difference can be minimized by minimizing the norm of the matrix $\lVert \mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }}\rVert$ .

Procedure

Given the constraints and optimization strategy derived from the optimal properties above, the MINQUE estimator ${\hat {\theta }}$ for $\theta =\sum _{i=1}^{k}p_{i}\sigma _{i}^{2}$ is derived by choosing a matrix $\mathbf {A}$ that minimizes $\lVert \mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }}\rVert$ , subject to the constraints

$\mathbf {A} \mathbf {X} =\mathbf {0}$ , and
$\mathrm {Tr} [\mathbf {A} \mathbf {V} _{i}]=p_{i}$ .

Examples of Estimators

Standard Estimator for Homoscedastic Error

In the Gauss-Markov model, the error variance $\sigma ^{2}$ is estimated using the following.

$s^{2}={\frac {1}{n-m}}(\mathbf {Y} -\mathbf {X} {\hat {\boldsymbol {\beta }}})^{\top }(\mathbf {Y} -\mathbf {X} {\hat {\boldsymbol {\beta }}})$

This estimator is unbiased and can be shown to minimize the Euclidean norm of the form $\lVert \mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }}\rVert$ .^[1] Thus, the standard estimator for error variance in the Gauss-Markov model is a MINQUE estimator.

Random Variables with Common Mean and Heteroscedastic Error

For random variables $Y_{1},\cdots ,Y_{n}$ with a common mean and different variances $\sigma _{1}^{2},\cdots ,\sigma _{n}^{2}$ , the MINQUE estimator for $\sigma _{i}^{2}$ is ${\frac {n}{n-2}}(Y_{i}-{\overline {Y}})^{2}-{\frac {s^{2}}{n-2}}$ , where ${\overline {Y}}={\frac {1}{n}}\sum _{i=1}^{n}Y_{i}$ and $s^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}(Y_{i}-{\overline {Y}})^{2}$ .^[1]

Estimator for Variance Components

Rao proposed a MINQUE estimator for the variance components model based on minimizing the Euclidean norm.^[2] The Euclidean norm $\lVert \cdot \rVert _{2}$ is the square root of the sum of squares of all elements in the matrix. When evaluating this norm below, $\mathbf {V} =\mathbf {V} _{1}+\cdots +\mathbf {V} _{k}=\mathbf {U} \mathbf {U} ^{\top }$ . Furthermore, using the cyclic property of traces, $\mathrm {Tr} [\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} {\boldsymbol {\Delta }}]=\mathrm {Tr} [\mathbf {A} \mathbf {U} {\boldsymbol {\Delta }}\mathbf {U} ^{\top }]=\mathrm {Tr} \left[\sum _{i=1}^{k}{\frac {p_{i}}{c_{i}}}\mathbf {A} \mathbf {V} _{i}\right]=\mathrm {Tr} [{\boldsymbol {\Delta }}{\boldsymbol {\Delta }}]$ .

${\begin{aligned}\lVert \mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }}\rVert _{2}^{2}&=(\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }})^{\top }(\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }})\\&=\mathrm {Tr} [\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} \mathbf {U} \mathbf {A} \mathbf {U} ^{\top }]-\mathrm {Tr} [2\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} {\boldsymbol {\Delta }}]+\mathrm {Tr} [{\boldsymbol {\Delta }}{\boldsymbol {\Delta }}]\\&=\mathrm {Tr} [\mathbf {A} \mathbf {V} \mathbf {A} \mathbf {V} ]-\mathrm {Tr} [{\boldsymbol {\Delta }}{\boldsymbol {\Delta }}]\end{aligned}}$

Note that since $\mathrm {Tr} [{\boldsymbol {\Delta }}{\boldsymbol {\Delta }}]$ does not depend on $\mathbf {A}$ , the MINQUE with the Euclidean norm is obtained by identifying the matrix $\mathbf {A}$ that minimizes $\mathrm {Tr} [\mathbf {A} \mathbf {V} \mathbf {A} \mathbf {V} ]$ , subject to the MINQUE constraints discussed above.

Rao showed that the matrix $\mathbf {A}$ that satisfies this optimization problem is

$\mathbf {A} _{\star }=\sum _{i=1}^{k}\lambda _{i}\mathbf {R} \mathbf {V} _{i}\mathbf {R}$ ,

where $\mathbf {R} =\mathbf {V} ^{-1}(\mathbf {I} -\mathbf {P} )$ , $\mathbf {P} =\mathbf {X} (\mathbf {X} ^{\top }\mathbf {V} ^{-1}\mathbf {X} )^{-}\mathbf {X} ^{\top }\mathbf {V} ^{-1}$ is the projection matrix into the column space of $\mathbf {X}$ , and $(\cdot )^{-}$ represents the generalized inverse of a matrix.

Therefore, the MINQUE estimator is the following, where the vectors ${\boldsymbol {\lambda }}$ and $\mathbf {Q}$ are defined based on the sum.

${\begin{aligned}{\hat {\theta }}&=\mathbf {Y} ^{\top }\mathbf {A} _{\star }\mathbf {Y} \\&=\sum _{i=1}^{k}\lambda _{i}\mathbf {Y} ^{\top }\mathbf {R} \mathbf {V} _{i}\mathbf {R} \mathbf {Y} \\&\equiv \sum _{i=1}^{k}\lambda _{i}Q_{i}\\&\equiv {\boldsymbol {\lambda }}^{\top }\mathbf {Q} \end{aligned}}$

The vector ${\boldsymbol {\lambda }}$ is obtained by using the constraint $\mathrm {Tr} [\mathbf {A} _{\star }\mathbf {V} _{i}]=p_{i}$ . That is, the vector represents the solution to the following system of equations $\forall j\in \{1,\cdots ,k\}$ .

${\begin{aligned}\mathrm {Tr} [\mathbf {A} _{\star }\mathbf {V} _{j}]&=p_{j}\\\mathrm {Tr} \left[\sum _{i=1}^{k}\lambda _{i}\mathbf {R} \mathbf {V} _{i}\mathbf {R} \mathbf {V} _{j}\right]&=p_{j}\\\sum _{i=1}^{k}\lambda _{i}\mathrm {Tr} [\mathbf {R} \mathbf {V} _{i}\mathbf {R} \mathbf {V} _{j}]&=p_{j}\end{aligned}}$

This can be written as a matrix product $\mathbf {S} {\boldsymbol {\lambda }}=\mathbf {p}$ , where $\mathbf {p} =[p_{1}\,\cdots \,p_{k}]^{\top }$ and $\mathbf {S}$ is the following.

$\mathbf {S} ={\begin{bmatrix}\mathrm {Tr} [\mathbf {R} \mathbf {V} _{1}\mathbf {R} \mathbf {V} _{1}]&\cdots &\mathrm {Tr} [\mathbf {R} \mathbf {V} _{k}\mathbf {R} \mathbf {V} _{1}]\\\vdots &\ddots &\vdots \\\mathrm {Tr} [\mathbf {R} \mathbf {V} _{1}\mathbf {R} \mathbf {V} _{k}]&\cdots &\mathrm {Tr} [\mathbf {R} \mathbf {V} _{k}\mathbf {R} \mathbf {V} _{k}]\end{bmatrix}}$

Then, ${\boldsymbol {\lambda }}=\mathbf {S} ^{-}\mathbf {p}$ . This implies that the MINQUE is ${\hat {\theta }}={\boldsymbol {\lambda }}^{\top }\mathbf {Q} =\mathbf {p} ^{\top }(\mathbf {S} ^{-})^{\top }\mathbf {Q} =\mathbf {p} ^{\top }\mathbf {S} ^{-}\mathbf {Q}$ . Note that $\theta =\sum _{i=1}^{k}p_{i}\sigma _{i}^{2}=\mathbf {p} ^{\top }{\boldsymbol {\sigma }}$ , where ${\boldsymbol {\sigma }}=[\sigma _{1}^{2}\,\cdots \,\sigma _{k}^{2}]^{\top }$ . Therefore, the estimator for the variance components is ${\hat {\boldsymbol {\sigma }}}=\mathbf {S} ^{-}\mathbf {Q}$ .

Extensions

MINQUE estimators can be obtained without the invariance criteria, in which case the estimator is only unbiased and minimizes the norm.^[2] Such estimators have slightly different constraints on the minimization problem.

The model can be extended to estimate covariance components.^[3] In such a model, the random effects of a component are assumed to have a common covariance structure $\mathbb {V} [{\boldsymbol {\xi }}_{i}]={\boldsymbol {\Sigma }}$ . A MINQUE estimator for a mixture of variance and covariance components was also proposed.^[3] In this model, $\mathbb {V} [{\boldsymbol {\xi }}_{i}]={\boldsymbol {\Sigma }}$ for $i\in \{1,\cdots ,s\}$ and $\mathbb {V} [{\boldsymbol {\xi }}_{i}]=\sigma _{i}^{2}\mathbf {I} _{c_{i}}$ for $i\in \{s+1,\cdots ,k\}$ .

This statistics-related article is a stub. You can help Wikipedia by expanding it.

References

^ ^a ^b ^c ^d ^e ^f Rao, C.R. (1970). "Estimation of heteroscedastic variances in linear models". Journal of the American Statistical Association. 65 (329): 161–172. doi:10.1080/01621459.1970.10481070. JSTOR 2283583.
^ ^a ^b ^c ^d ^e ^f Rao, C.R. (1971). "Estimation of variance and covariance components MINQUE theory". J Multivar Anal. 1: 257–275. doi:10.1016/0047-259x(71)90001-7. hdl:10338.dmlcz/104230.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Rao, C.R. (1972). "Estimation of variance and covariance components in linear models". Journal of the American Statistical Association. 67 (337): 112–115. doi:10.1080/01621459.1972.10481212. JSTOR 2284708.

[:0-1] ^ ^a ^b ^c ^d ^e ^f Rao, C.R. (1970). "Estimation of heteroscedastic variances in linear models". Journal of the American Statistical Association. 65 (329): 161–172. doi:10.1080/01621459.1970.10481070. JSTOR 2283583.

[:1-2] ^ ^a ^b ^c ^d ^e ^f Rao, C.R. (1971). "Estimation of variance and covariance components MINQUE theory". J Multivar Anal. 1: 257–275. doi:10.1016/0047-259x(71)90001-7. hdl:10338.dmlcz/104230.

[:2-3] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Rao, C.R. (1972). "Estimation of variance and covariance components in linear models". Journal of the American Statistical Association. 67 (337): 112–115. doi:10.1080/01621459.1972.10481212. JSTOR 2284708.

[1]

[2]

[3]

@@ Line 1: / Line 1: @@
 {{Short description|Theory in the field of statistics}}
-In [[statistics]], the theory of '''minimum norm quadratic unbiased estimation (MINQUE)'''<ref>{{cite journal | last=Rao | first=C.R. | year=1970 | title=Estimation of heteroscedastic variances in linear models | journal=Journal of the American Statistical Association | volume=65 | number=329 | pages=161&ndash;172 | jstor=2283583|doi=10.1080/01621459.1970.10481070 }}</ref><ref>{{cite journal | last=Rao | first=C.R. | year=1971 | title=Estimation of variance and covariance components MINQUE theory | journal=J Multivar Anal | volume=1 | pages=257&ndash;275 | doi=10.1016/0047-259x(71)90001-7| hdl=10338.dmlcz/104230 | hdl-access=free }}</ref><ref>{{cite journal | last=Rao | first=C.R. | year=1972 | title=Estimation of variance and covariance components in linear models | journal=Journal of the American Statistical Association | volume=67 | issue=337 | pages=112&ndash;115 | jstor=2284708|doi=10.1080/01621459.1972.10481212 }}</ref> was developed by [[C.R. Rao]]. Its application was originally to the problem of [[heteroscedasticity]] and the estimation of variance components in [[random effects model]]s.
+In [[statistics]], the theory of '''minimum norm quadratic unbiased estimation (MINQUE)'''<ref name=":0">{{cite journal | last=Rao | first=C.R. | year=1970 | title=Estimation of heteroscedastic variances in linear models | journal=Journal of the American Statistical Association | volume=65 | number=329 | pages=161&ndash;172 | jstor=2283583|doi=10.1080/01621459.1970.10481070 }}</ref><ref name=":1">{{cite journal | last=Rao | first=C.R. | year=1971 | title=Estimation of variance and covariance components MINQUE theory | journal=J Multivar Anal | volume=1 | pages=257&ndash;275 | doi=10.1016/0047-259x(71)90001-7| hdl=10338.dmlcz/104230 | hdl-access=free }}</ref><ref name=":2">{{cite journal | last=Rao | first=C.R. | year=1972 | title=Estimation of variance and covariance components in linear models | journal=Journal of the American Statistical Association | volume=67 | issue=337 | pages=112&ndash;115 | jstor=2284708|doi=10.1080/01621459.1972.10481212 }}</ref> was developed by [[C. R. Rao]]. MINQUE is a theory alongside other estimation methods in [[estimation theory]], such as the [[Method of moments (statistics)|method of moments]] or [[maximum likelihood estimation]]. Similar to the theory of [[Gauss–Markov theorem|best linear unbiased estimation]], MINQUE is specifically concerned with [[linear regression]] models.<ref name=":0" /> The method was originally conceived to estimate [[Homoscedasticity and heteroscedasticity|heteroscedastic]] error variance in multiple linear regression.<ref name=":0" /> MINQUE estimators also provide an alternative to maximum likelihood estimators or [[restricted maximum likelihood]] estimators for variance components in [[Mixed model|mixed effects models]].<ref name=":2" /> MINQUE estimators are [[Quadratic form|quadratic forms]] of the response variable and are used to estimate a linear function of the variances.
+== Principles ==
-The theory involves three stages:
+We are concerned with a [[Mixed model|mixed effects model]] for the random vector <math>\mathbf{Y} \in \mathbb{R}^n</math> with the following linear structure.
-:*defining a general class of potential estimators as quadratic functions of the observed data, where the estimators relate to a vector of model parameters;
-:*specifying certain constraints on the desired properties of the estimators, such as unbiasedness;
+<math>\mathbf{Y} = \mathbf{X}\boldsymbol\beta + \mathbf{U}_1 \boldsymbol\xi_1
-:*choosing the optimal estimator by minimising a "norm" which measures the size of the covariance matrix of the estimators.
++ \cdots + \mathbf{U}_k \boldsymbol\xi_k</math>
+Here, <math>\mathbf{X} \in \mathbb{R}^{n\times m}</math> is a [[design matrix]] for the fixed effects, <math>\boldsymbol\beta \in \mathbb{R}^m</math> represents the unknown fixed-effect parameters, <math>\mathbf{U}_i \in \mathbb{R}^{n\times c_i}</math> is a design matrix for the <math>i</math>-th random-effect component, and <math>\boldsymbol\xi_i\in\mathbb{R}^{c_i}</math> is a [[Multivariate random variable|random vector]] for the <math>i</math>-th random-effect component. The random effects are assumed to have zero mean (<math>\mathbb{E}[\boldsymbol\xi_i]=\mathbf{0}</math>) and be uncorrelated (<math>\mathbb{V}[\boldsymbol\xi_i]=\sigma^2_i\mathbf{I}_{c_i}</math>). Furthermore, any two random effect vectors are also uncorrelated (<math>\mathbb{V}[\boldsymbol\xi_i,
+\boldsymbol\xi_j]=\mathbf{0}\,
+\forall i\neq j</math>). The unknown variances <math>\sigma^2_1,\cdots,\sigma^2_k</math> represent the variance components of the model.
+This is a general model that captures commonly used linear regression models.
+# '''Gauss-Markov Model<ref name=":2" />:''' If we consider a one-component model where <math>\mathbf{U}_1=\mathbf{I}_n</math>, then the model is equivalent to the [[Gauss–Markov theorem|Gauss-Markov model]] <math>\mathbf{Y}=\mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon</math> with <math>\mathbb{E}[\boldsymbol\epsilon]=\mathbf{0}</math> and <math>\mathbb{V}[\boldsymbol\epsilon]=\sigma^2_1 \mathbf{I}_n</math>.
+# '''Heteroscedastic Model'''<ref name=":0" />''':''' Each set of random variables in <math>\mathbf{Y}</math> that shares a common variance can be modeled as an individual variance component with an appropriate <math>\mathbf{U}_i</math>.
+A compact representation for the model is the following, where <math>\mathbf{U} = \left[\begin{array}{c|c|c}\mathbf{U}_1&\cdots&\mathbf{U}_k\end{array}\right]</math> and <math>\boldsymbol\xi^\top = \left[\begin{array}{c|c|c}
+\boldsymbol\xi_1^\top&\cdots&\boldsymbol\xi_k^\top\end{array}\right]</math>.
+<math>\mathbf{Y}=\mathbf{X}\boldsymbol\beta+\mathbf{U}\boldsymbol\xi</math>
+Note that this model makes no distributional assumptions about <math>\mathbf{Y}</math> other than the first and second moments.<ref name=":2" />
+<math>\mathbb{E}[\mathbf{Y}] = \mathbf{X}\boldsymbol\beta</math>
+<math>\mathbb{V}[\mathbf{Y}]=\sigma^2_1\mathbf{U}_1\mathbf{U}_1^\top + \cdots +
+\sigma^2_k \mathbf{U}_k \mathbf{U}_k^\top
+\equiv \sigma^2_1\mathbf{V}_1 + \cdots + \sigma^2_k \mathbf{V}_k</math>
+The goal in MINQUE is to estimate <math>\theta = \sum_{i=1}^k p_i \sigma^2_i</math> using a quadratic form <math>\hat{\theta}=\mathbf{Y}^\top \mathbf{A} \mathbf{Y}</math>. MINQUE estimators are derived by identifying a matrix <math>\mathbf{A}</math> such that the estimator has some desirable properties,<ref name=":1" /><ref name=":2" /> described below.
+=== Optimal Estimator Properties to Constrain MINQUE ===
+==== Invariance to translation of the fixed effects ====
+Consider a new fixed-effect parameter <math>\boldsymbol\gamma=\boldsymbol\beta - \boldsymbol\beta_0</math>, which represents a translation of the original fixed effect. The new, equivalent model is now the following.
+<math>\mathbf{Y} - \mathbf{X}\boldsymbol\beta_0 =
+\mathbf{X}\boldsymbol\gamma + \mathbf{U}\boldsymbol\xi</math>
+Under this equivalent model, the MINQUE estimator is now <math>(\mathbf{Y} - \mathbf{X}\boldsymbol\beta_0)^\top \mathbf{A}
+(\mathbf{Y} - \mathbf{X}\boldsymbol\beta_0)</math>. [[C. R. Rao|Rao]] argued that since the underlying models are equivalent, this estimator should be equal to <math>\mathbf{Y}^\top \mathbf{A} \mathbf{Y}</math>.<ref name=":1" /><ref name=":2" /> This can be achieved by constraining <math>\mathbf{A}</math> such that <math>\mathbf{A}\mathbf{X} = \mathbf{0}</math>, which ensures that all terms other than <math>\mathbf{Y}^\top \mathbf{A} \mathbf{Y}</math> in the expansion of the quadratic form are zero.
+==== Unbiased estimation ====
+Suppose that we constrain <math>\mathbf{A}\mathbf{X} = \mathbf{0}</math>, as argued in the section above. Then, the MINQUE estimator has the following form
+<math>\begin{align}
+\hat{\theta} &= \mathbf{Y}^\top \mathbf{A} \mathbf{Y}\\
+&= (\mathbf{X}\boldsymbol\beta + \mathbf{U}\boldsymbol\xi)^\top \mathbf{A} (\mathbf{X}\boldsymbol\beta + \mathbf{U}\boldsymbol\xi)\\
+&= \boldsymbol\xi^\top\mathbf{U}^\top\mathbf{A}\mathbf{U}\boldsymbol\xi
+\end{align}</math>
+To ensure that this estimator is [[Bias of an estimator|unbiased]], the expectation of the estimator <math>\mathbb{E}[\hat{\theta}]</math> must equal the parameter of interest, <math>\theta</math>. Below, the expectation of the estimator can be decomposed for each component since the components are uncorrelated with each other. Furthermore, the cyclic property of the [[Trace (linear algebra)|trace]] is used to evaluate the expectation with respect to <math>\boldsymbol\xi_i</math>.
+<math>\begin{align}
+\mathbb{E}[\hat{\theta}] &= \mathbb{E}[\boldsymbol\xi^\top \mathbf{U}^\top \mathbf{A} \mathbf{U} \boldsymbol\xi]\\
+&= \sum_{i=1}^k \mathbb{E}[\boldsymbol\xi_i^\top\mathbf{U}_i^\top\mathbf{A}\mathbf{U}_i\boldsymbol\xi_i]\\
+&= \sum_{i=1}^k \sigma_i^2 \mathrm{Tr}[\mathbf{U}_i^\top \mathbf{A} \mathbf{U}_i]
+\end{align}</math>
+To ensure that this estimator is unbiased, [[C. R. Rao|Rao]] suggested setting <math>\sum_{i=1}^k \sigma_i^2 \mathrm{Tr}[\mathbf{U}_i^\top \mathbf{A} \mathbf{U}_i] = \sum_{i=1}^k p_i \sigma_i^2</math>, which can be accomplished by constraining <math>\mathbf{A}</math> such that <math>\mathrm{Tr}[\mathbf{U}_i^\top \mathbf{A} \mathbf{U}_i] = \mathrm{Tr}[\mathbf{A}\mathbf{V}_i] = p_i</math> for all components.<ref name=":2" />
+==== Minimum Norm ====
+[[C. R. Rao|Rao]] argues that if <math>\boldsymbol\xi</math> were observed, a "natural" estimator for <math>\theta</math> would be the following<ref name=":1" /><ref name=":2" /> since <math>\mathbb{E}[\boldsymbol\xi_i^\top\boldsymbol\xi_i]=c_i \sigma_i^2</math>. Here, <math>\boldsymbol\Delta</math> is defined as a [[diagonal matrix]].
+<math>\frac{p_1}{c_1}\boldsymbol\xi_1^\top\boldsymbol\xi_1 + \cdots + \frac{p_k}{c_k}\boldsymbol\xi_k^\top\boldsymbol\xi_k
+= \boldsymbol\xi^\top\left[\mathrm{diag}\left(\frac{p_1}{c_i},\cdots,\frac{p_k}{c_k}\right)\right]\boldsymbol\xi
+\equiv \boldsymbol\xi^\top\boldsymbol\Delta\boldsymbol\xi</math>
+The difference between the proposed estimator and the natural estimator is <math>\boldsymbol\xi^\top (\mathbf{U}^\top \mathbf{A} \mathbf{U} - \boldsymbol\Delta)\boldsymbol\xi</math>. This difference can be minimized by minimizing the [[Matrix norm|norm]] of the matrix <math>\lVert \mathbf{U}^\top\mathbf{A}\mathbf{U}-\boldsymbol\Delta \rVert</math>.
+=== Procedure ===
+Given the constraints and optimization strategy derived from the optimal properties above, the MINQUE estimator <math>\hat{\theta}</math> for <math>\theta=\sum_{i=1}^k p_i\sigma_i^2</math> is derived by choosing a matrix <math>\mathbf{A}</math> that minimizes <math>\lVert \mathbf{U}^\top\mathbf{A}\mathbf{U}-\boldsymbol\Delta \rVert</math>, subject to the constraints
+# <math>\mathbf{A}\mathbf{X}=\mathbf{0}</math>, and
+# <math>\mathrm{Tr}[\mathbf{A}\mathbf{V}_i]=p_i</math>.
+== Examples of Estimators ==
+=== Standard Estimator for Homoscedastic Error ===
+In the [[Gauss–Markov theorem|Gauss-Markov model]], the error variance <math>\sigma^2</math> is estimated using the following.
+<math>s^2 = \frac{1}{n-m}(\mathbf{Y}-\mathbf{X}\hat{\boldsymbol\beta})^\top(\mathbf{Y}-\mathbf{X}\hat{\boldsymbol\beta})</math>
+This estimator is unbiased and can be shown to minimize the [[Matrix norm|Euclidean norm]] of the form <math>\lVert \mathbf{U}^\top\mathbf{A}\mathbf{U}-\boldsymbol\Delta \rVert</math>.<ref name=":0" /> Thus, the standard estimator for error variance in the Gauss-Markov model is a MINQUE estimator.
+=== Random Variables with Common Mean and Heteroscedastic Error ===
+For random variables <math>Y_1,\cdots,Y_n</math> with a common mean and different variances <math>\sigma^2_1,\cdots,\sigma^2_n</math>, the MINQUE estimator for <math>\sigma^2_i</math> is <math>\frac{n}{n-2}(Y_i - \overline{Y})^2 - \frac{s^2}{n - 2}</math>, where <math>\overline{Y} = \frac{1}{n} \sum_{i=1}^n Y_i</math> and <math>s^2 = \frac{1}{n-1} \sum_{i=1}^n (Y_i - \overline{Y})^2</math>.<ref name=":0" />
+=== Estimator for Variance Components ===
+[[C. R. Rao|Rao]] proposed a MINQUE estimator for the variance components model based on minimizing the [[Matrix norm|Euclidean norm]].<ref name=":1" /> The Euclidean norm <math>\lVert \cdot \rVert_2</math> is the square root of the sum of squares of all elements in the matrix. When evaluating this norm below, <math>\mathbf{V}=\mathbf{V}_1+\cdots+\mathbf{V}_k = \mathbf{U} \mathbf{U}^\top</math>. Furthermore, using the cyclic property of [[Trace (linear algebra)|traces]], <math>\mathrm{Tr}[\mathbf{U}^\top\mathbf{A}\mathbf{U}\boldsymbol\Delta] =
+\mathrm{Tr}[\mathbf{A}\mathbf{U}\boldsymbol\Delta\mathbf{U}^\top] =
+\mathrm{Tr}\left[\sum_{i=1}^k \frac{p_i}{c_i} \mathbf{A}\mathbf{V}_i \right] =
+\mathrm{Tr}[\boldsymbol\Delta\boldsymbol\Delta] </math>.
+<math>\begin{align}
+\lVert \mathbf{U}^\top\mathbf{A}\mathbf{U} - \boldsymbol\Delta \rVert^2_2 &= (\mathbf{U}^\top\mathbf{A}\mathbf{U} - \boldsymbol\Delta)^\top (\mathbf{U}^\top\mathbf{A}\mathbf{U} - \boldsymbol\Delta)\\
+&= \mathrm{Tr}[\mathbf{U}^\top\mathbf{A}\mathbf{U}\mathbf{U}\mathbf{A}\mathbf{U}^\top] - \mathrm{Tr}[2\mathbf{U}^\top\mathbf{A}\mathbf{U}\boldsymbol\Delta] + \mathrm{Tr}[\boldsymbol\Delta\boldsymbol\Delta]\\
+&= \mathrm{Tr}[\mathbf{A}\mathbf{V}\mathbf{A}\mathbf{V}] - \mathrm{Tr}[\boldsymbol\Delta\boldsymbol\Delta]
+\end{align}</math>
+Note that since <math>\mathrm{Tr}[\boldsymbol\Delta\boldsymbol\Delta] </math> does not depend on <math>\mathbf{A} </math>, the MINQUE with the Euclidean norm is obtained by identifying the matrix <math>\mathbf{A} </math> that minimizes <math>\mathrm{Tr}[\mathbf{A}\mathbf{V}\mathbf{A}\mathbf{V}] </math>, subject to the MINQUE constraints discussed above.
+Rao showed that the matrix <math>\mathbf{A} </math> that satisfies this optimization problem is
+<math>\mathbf{A}_\star=\sum_{i=1}^k \lambda_i \mathbf{R}\mathbf{V}_i\mathbf{R} </math>,
+where <math>\mathbf{R} = \mathbf{V}^{-1}(\mathbf{I}-\mathbf{P}) </math>, <math>\mathbf{P}=\mathbf{X}(\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X})^{-}\mathbf{X}^\top\mathbf{V}^{-1} </math> is the [[projection matrix]] into the column space of <math>\mathbf{X} </math>, and <math>(\cdot)^{-} </math> represents the [[generalized inverse]] of a matrix.
+Therefore, the MINQUE estimator is the following, where the vectors <math>\boldsymbol\lambda </math> and <math>\mathbf{Q} </math> are defined based on the sum.
+<math>\begin{align}
+\hat{\theta} &= \mathbf{Y}^\top \mathbf{A}_\star\mathbf{Y}\\
+&= \sum_{i=1}^k \lambda_i \mathbf{Y}^\top\mathbf{R}\mathbf{V}_i\mathbf{R}\mathbf{Y}\\
+&\equiv\sum_{i=1}^k \lambda_i Q_i\\
+&\equiv \boldsymbol\lambda^\top \mathbf{Q}
+\end{align} </math>
+The vector <math>\boldsymbol\lambda </math> is obtained by using the constraint <math>\mathrm{Tr}[\mathbf{A}_\star\mathbf{V}_i]=p_i</math>. That is, the vector represents the solution to the following system of equations <math>\forall j\in\{1,\cdots,k\} </math>.
+<math>\begin{align}
+\mathrm{Tr}[\mathbf{A}_\star\mathbf{V}_j] &= p_j\\
+\mathrm{Tr}\left[ \sum_{i=1}^k \lambda_i \mathbf{R}\mathbf{V}_i\mathbf{R}\mathbf{V}_j \right] &= p_j\\
+\sum_{i=1}^k \lambda_i \mathrm{Tr}[\mathbf{R}\mathbf{V}_i\mathbf{R}\mathbf{V}_j] &= p_j
+\end{align} </math>
+This can be written as a matrix product <math>\mathbf{S}\boldsymbol\lambda=\mathbf{p} </math>, where <math>\mathbf{p}=[p_1\,\cdots\,p_k]^\top </math> and <math>\mathbf{S} </math> is the following.
+<math>\mathbf{S}=\begin{bmatrix}
+\mathrm{Tr}[\mathbf{R}\mathbf{V}_1\mathbf{R}\mathbf{V}_1] & \cdots & \mathrm{Tr}[\mathbf{R}\mathbf{V}_k\mathbf{R}\mathbf{V}_1]\\
+\vdots & \ddots & \vdots\\
+\mathrm{Tr}[\mathbf{R}\mathbf{V}_1\mathbf{R}\mathbf{V}_k] & \cdots & \mathrm{Tr}[\mathbf{R}\mathbf{V}_k\mathbf{R}\mathbf{V}_k]
+\end{bmatrix} </math>
+Then, <math>\boldsymbol\lambda=\mathbf{S}^{-}\mathbf{p} </math>. This implies that the MINQUE is <math>\hat{\theta}=\boldsymbol\lambda^\top\mathbf{Q}=\mathbf{p}^\top(\mathbf{S}^{-})^\top\mathbf{Q}=\mathbf{p}^\top\mathbf{S}^{-}\mathbf{Q} </math>. Note that <math>\theta=\sum_{i=1}^k p_i \sigma_i^2 = \mathbf{p}^\top\boldsymbol\sigma </math>, where <math>\boldsymbol\sigma = [\sigma^2_1\,\cdots\,\sigma^2_k]^\top </math>. Therefore, the estimator for the variance components is <math>\hat{\boldsymbol\sigma}=\mathbf{S}^{-}\mathbf{Q} </math>.
+== Extensions ==
+MINQUE estimators can be obtained without the invariance criteria, in which case the estimator is only unbiased and minimizes the norm.<ref name=":1" /> Such estimators have slightly different constraints on the minimization problem.
+The model can be extended to estimate covariance components.<ref name=":2" /> In such a model, the random effects of a component are assumed to have a common covariance structure <math>\mathbb{V}[\boldsymbol\xi_i]=\boldsymbol\Sigma</math>. A MINQUE estimator for a mixture of variance and covariance components was also proposed.<ref name=":2" /> In this model, <math>\mathbb{V}[\boldsymbol\xi_i]=\boldsymbol\Sigma</math> for <math>i\in
+\{1,\cdots,s\}</math> and <math>\mathbb{V}[\boldsymbol\xi_i]=
+\sigma_i^2\mathbf{I}_{c_i}</math> for <math>i\in\{s+1,\cdots,k\}</math>.
 {{stats-stub|date=August 2016}}