Schur product theorem: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 19:09, 23 November 2024

In mathematics, particularly in linear algebra, the Schur product theorem states that the Hadamard product of two positive definite matrices is also a positive definite matrix. The result is named after Issai Schur^[1] (Schur 1911, p. 14, Theorem VII) (note that Schur signed as J. Schur in Journal für die reine und angewandte Mathematik.^[2]^[3])

The converse of the theorem holds in the following sense: if $M$ is a symmetric matrix and the Hadamard product $M\circ N$ is positive definite for all positive definite matrices $N$ , then $M$ itself is positive definite.

Proof

Proof using the trace formula

For any matrices $M$ and $N$ , the Hadamard product $M\circ N$ considered as a bilinear form acts on vectors $a,b$ as

a^{*}(M\circ N)b=\operatorname {tr} \left(M^{\textsf {T}}\operatorname {diag} \left(a^{*}\right)N\operatorname {diag} (b)\right)

where $\operatorname {tr}$ is the matrix trace and $\operatorname {diag} (a)$ is the diagonal matrix having as diagonal entries the elements of $a$ .

Suppose $M$ and $N$ are positive definite, and so Hermitian. We can consider their square-roots $M^{\frac {1}{2}}$ and $N^{\frac {1}{2}}$ , which are also Hermitian, and write

\operatorname {tr} \left(M^{\textsf {T}}\operatorname {diag} \left(a^{*}\right)N\operatorname {diag} (b)\right)=\operatorname {tr} \left({\overline {M}}^{\frac {1}{2}}{\overline {M}}^{\frac {1}{2}}\operatorname {diag} \left(a^{*}\right)N^{\frac {1}{2}}N^{\frac {1}{2}}\operatorname {diag} (b)\right)=\operatorname {tr} \left({\overline {M}}^{\frac {1}{2}}\operatorname {diag} \left(a^{*}\right)N^{\frac {1}{2}}N^{\frac {1}{2}}\operatorname {diag} (b){\overline {M}}^{\frac {1}{2}}\right)

Then, for $a=b$ , this is written as $\operatorname {tr} \left(A^{*}A\right)$ for $A=N^{\frac {1}{2}}\operatorname {diag} (a){\overline {M}}^{\frac {1}{2}}$ and thus is strictly positive for $A\neq 0$ , which occurs if and only if $a\neq 0$ . This shows that $(M\circ N)$ is a positive definite matrix.

Proof using Gaussian integration

Case of M = N

Let $X$ be an $n$ -dimensional centered Gaussian random variable with covariance $\langle X_{i}X_{j}\rangle =M_{ij}$ . Then the covariance matrix of $X_{i}^{2}$ and $X_{j}^{2}$ is

\operatorname {Cov} \left(X_{i}^{2},X_{j}^{2}\right)=\left\langle X_{i}^{2}X_{j}^{2}\right\rangle -\left\langle X_{i}^{2}\right\rangle \left\langle X_{j}^{2}\right\rangle

Using Wick's theorem to develop $\left\langle X_{i}^{2}X_{j}^{2}\right\rangle =2\left\langle X_{i}X_{j}\right\rangle ^{2}+\left\langle X_{i}^{2}\right\rangle \left\langle X_{j}^{2}\right\rangle$ we have

\operatorname {Cov} \left(X_{i}^{2},X_{j}^{2}\right)=2\left\langle X_{i}X_{j}\right\rangle ^{2}=2M_{ij}^{2}

Since a covariance matrix is positive definite, this proves that the matrix with elements $M_{ij}^{2}$ is a positive definite matrix.

General case

Let $X$ and $Y$ be $n$ -dimensional centered Gaussian random variables with covariances $\left\langle X_{i}X_{j}\right\rangle =M_{ij}$ , $\left\langle Y_{i}Y_{j}\right\rangle =N_{ij}$ and independent from each other so that we have

\left\langle X_{i}Y_{j}\right\rangle =0

for any

i,j

Then the covariance matrix of $X_{i}Y_{i}$ and $X_{j}Y_{j}$ is

\operatorname {Cov} \left(X_{i}Y_{i},X_{j}Y_{j}\right)=\left\langle X_{i}Y_{i}X_{j}Y_{j}\right\rangle -\left\langle X_{i}Y_{i}\right\rangle \left\langle X_{j}Y_{j}\right\rangle

Using Wick's theorem to develop

\left\langle X_{i}Y_{i}X_{j}Y_{j}\right\rangle =\left\langle X_{i}X_{j}\right\rangle \left\langle Y_{i}Y_{j}\right\rangle +\left\langle X_{i}Y_{i}\right\rangle \left\langle X_{j}Y_{j}\right\rangle +\left\langle X_{i}Y_{j}\right\rangle \left\langle X_{j}Y_{i}\right\rangle

and also using the independence of $X$ and $Y$ , we have

\operatorname {Cov} \left(X_{i}Y_{i},X_{j}Y_{j}\right)=\left\langle X_{i}X_{j}\right\rangle \left\langle Y_{i}Y_{j}\right\rangle =M_{ij}N_{ij}

Since a covariance matrix is positive definite, this proves that the matrix with elements $M_{ij}N_{ij}$ is a positive definite matrix.

Proof using eigendecomposition

Proof of positive semidefiniteness

Let $M=\sum \mu _{i}m_{i}m_{i}^{\textsf {T}}$ and $N=\sum \nu _{i}n_{i}n_{i}^{\textsf {T}}$ . Then

M\circ N=\sum _{ij}\mu _{i}\nu _{j}\left(m_{i}m_{i}^{\textsf {T}}\right)\circ \left(n_{j}n_{j}^{\textsf {T}}\right)=\sum _{ij}\mu _{i}\nu _{j}\left(m_{i}\circ n_{j}\right)\left(m_{i}\circ n_{j}\right)^{\textsf {T}}

Each $\left(m_{i}\circ n_{j}\right)\left(m_{i}\circ n_{j}\right)^{\textsf {T}}$ is positive semidefinite (but, except in the 1-dimensional case, not positive definite, since they are rank 1 matrices). Also, $\mu _{i}\nu _{j}>0$ thus the sum $M\circ N$ is also positive semidefinite.

Proof of definiteness

To show that the result is positive definite requires even further proof. We shall show that for any vector $a\neq 0$ , we have $a^{\textsf {T}}(M\circ N)a>0$ . Continuing as above, each $a^{\textsf {T}}\left(m_{i}\circ n_{j}\right)\left(m_{i}\circ n_{j}\right)^{\textsf {T}}a\geq 0$ , so it remains to show that there exist $i$ and $j$ for which corresponding term above is nonzero. For this we observe that

a^{\textsf {T}}(m_{i}\circ n_{j})(m_{i}\circ n_{j})^{\textsf {T}}a=\left(\sum _{k}m_{i,k}n_{j,k}a_{k}\right)^{2}

Since $N$ is positive definite, there is a $j$ for which $n_{j}\circ a\neq 0$ (since otherwise $n_{j}^{\textsf {T}}a=\sum _{k}(n_{j}\circ a)_{k}=0$ for all $j$ ), and likewise since $M$ is positive definite there exists an $i$ for which $\sum _{k}m_{i,k}(n_{j}\circ a)_{k}=m_{i}^{\textsf {T}}(n_{j}\circ a)\neq 0.$ However, this last sum is just $\sum _{k}m_{i,k}n_{j,k}a_{k}$ . Thus its square is positive. This completes the proof.

References

^ Schur, J. (1911). "Bemerkungen zur Theorie der beschränkten Bilinearformen mit unendlich vielen Veränderlichen". Journal für die reine und angewandte Mathematik. 1911 (140): 1–28. doi:10.1515/crll.1911.140.1. S2CID 120411177.
^ Zhang, Fuzhen, ed. (2005). The Schur Complement and Its Applications. Numerical Methods and Algorithms. Vol. 4. doi:10.1007/b105056. ISBN 0-387-24271-6., page 9, Ch. 0.6 Publication under J. Schur
^ Ledermann, W. (1983). "Issai Schur and His School in Berlin". Bulletin of the London Mathematical Society. 15 (2): 97–106. doi:10.1112/blms/15.2.97.

External links

Bemerkungen zur Theorie der beschränkten Bilinearformen mit unendlich vielen Veränderlichen at EUDML

[Sch1911-1] Schur, J. (1911). "Bemerkungen zur Theorie der beschränkten Bilinearformen mit unendlich vielen Veränderlichen". Journal für die reine und angewandte Mathematik. 1911 (140): 1–28. doi:10.1515/crll.1911.140.1. S2CID 120411177.

[2] Zhang, Fuzhen, ed. (2005). The Schur Complement and Its Applications. Numerical Methods and Algorithms. Vol. 4. doi:10.1007/b105056. ISBN 0-387-24271-6., page 9, Ch. 0.6 Publication under J. Schur

[3] Ledermann, W. (1983). "Issai Schur and His School in Berlin". Bulletin of the London Mathematical Society. 15 (2): 97–106. doi:10.1112/blms/15.2.97.

[1]

[2]

[3]

@@ Line 1: / Line 1: @@
-In [[mathematics]], particularly in [[linear algebra]], the '''Schur product theorem''' states that the [[Hadamard product (matrices)|Hadamard product]] of two [[positive definite matrices]] is also a positive definite matrix. The result is named after [[Issai Schur]]<ref name="Sch1911">{{Cite journal | doi = 10.1515/crll.1911.140.1 | title = Bemerkungen zur Theorie der beschränkten Bilinearformen mit unendlich vielen Veränderlichen | journal = Journal für die reine und angewandte Mathematik (Crelle's Journal) | volume = 1911 | issue = 140 | pages = 1–28| year = 1911 | pmid =  | pmc = }}</ref> (Schur 1911, p.&nbsp;14, Theorem VII) (note that Schur signed as J. Schur in ''Journal für die reine und angewandte Mathematik''.<ref>{{Cite journal | editor1-last = Zhang | editor1-first = Fuzhen | title = The Schur Complement and Its Applications | doi = 10.1007/b105056 | series = Numerical Methods and Algorithms | volume = 4 | year = 2005 | isbn = 0-387-24271-6 | pmid =  | pmc = }}, page 9, Ch. 0.6 ''Publication under J. Schur''</ref><ref>{{Cite journal | last1 = Ledermann | first1 = W. | title = Issai Schur and His School in Berlin | doi = 10.1112/blms/15.2.97 | journal = Bulletin of the London Mathematical Society | volume = 15 | issue = 2 | pages = 97–106 | year = 1983 | pmid =  | pmc = }}</ref>)
+In [[mathematics]], particularly in [[linear algebra]], the '''Schur product theorem''' states that the [[Hadamard product (matrices)|Hadamard product]] of two [[positive definite matrices]] is also a positive definite matrix.
+The result is named after [[Issai Schur]]<ref name="Sch1911">{{Cite journal | doi = 10.1515/crll.1911.140.1 | title = Bemerkungen zur Theorie der beschränkten Bilinearformen mit unendlich vielen Veränderlichen | journal = Journal für die reine und angewandte Mathematik | volume = 1911 | issue = 140 | pages = 1–28| year = 1911 | last1 = Schur | first1 = J. | s2cid = 120411177 }}</ref> (Schur 1911, p.&nbsp;14, Theorem VII) (note that Schur signed as J. Schur in ''Journal für die reine und angewandte Mathematik''.<ref>{{Cite book | editor1-last = Zhang | editor1-first = Fuzhen | title = The Schur Complement and Its Applications | doi = 10.1007/b105056 | series = Numerical Methods and Algorithms | volume = 4 | year = 2005 | isbn = 0-387-24271-6 }}, page 9, Ch. 0.6 ''Publication under J. Schur''</ref><ref>{{Cite journal | last1 = Ledermann | first1 = W. | title = Issai Schur and His School in Berlin | doi = 10.1112/blms/15.2.97 | journal = Bulletin of the London Mathematical Society | volume = 15 | issue = 2 | pages = 97–106 | year = 1983 }}</ref>)
+The converse of the theorem holds in the following sense: if <math>M</math> is a symmetric matrix and the Hadamard product <math>M \circ N</math> is positive definite for all positive definite matrices <math>N</math>, then <math>M</math> itself is positive definite.
 == Proof ==
@@ Line 6: / Line 9: @@
 For any matrices <math>M</math> and <math>N</math>, the Hadamard product <math>M \circ N</math> considered as a bilinear form acts on vectors <math>a, b</math> as
-: <math>a^* (M \circ N) b = \operatorname{tr}(M^T \operatorname{diag}(a^*) N \operatorname{diag}(b))</math>
+: <math>a^* (M \circ N) b = \operatorname{tr}\left(M^\textsf{T} \operatorname{diag}\left(a^*\right) N \operatorname{diag}(b)\right)</math>
-where <math>\operatorname{tr}</math> is the matrix [[Trace (linear algebra)|trace]] and <math>\operatorname{diag}(a)</math> is the [[diagonal matrix]] having as diagonal entries the elements of <math>a</math>.
-Suppose <math>M</math> and <math>N</math> are positive definite, and so [[Hermitian matrix|Hermitian]]. We can consider their square-roots <math>M^{\frac 1 2}</math> and <math>N^{\frac 1 2}</math>, which are also Hermitian, and write
+where <math>\operatorname{tr}</math> is the matrix [[trace (linear algebra)|trace]] and <math>\operatorname{diag}(a)</math> is the [[diagonal matrix]] having as diagonal entries the elements of <math>a</math>.
-: <math>\operatorname{tr}(M^T \operatorname{diag}(a^*) N \operatorname{diag}(b)) = \operatorname{tr}(\overline{M}^{\frac 1 2} \overline{M}^{\frac 1 2} \operatorname{diag}(a^*) N^{\frac 1 2} N^{\frac 1 2} \operatorname{diag}(b)) = \operatorname{tr}(\overline{M}^{\frac 1 2} \operatorname{diag}(a^*) N^{\frac 1 2} N^{\frac 1 2} \operatorname{diag}(b) \overline{M}^{\frac 1 2})</math>
-Then, for <math>a=b</math>, this is written as <math>\operatorname{tr}(A^* A)</math> for <math>A = N^{\frac 1 2} \operatorname{diag}(a) \overline{M}^{\frac 1 2}</math>
+Suppose <math>M</math> and <math>N</math> are positive definite, and so [[Hermitian matrix|Hermitian]]. We can consider their square-roots <math>M^\frac{1}{2}</math> and <math>N^\frac{1}{2}</math>, which are also Hermitian, and write
+: <math>
-and thus is strictly positive for <math>A\neq 0</math>, which occurs if and only if <math>a \neq 0</math>.  This shows that <math>(M \circ N)</math> is a positive definite matrix.
+  \operatorname{tr}\left(M^\textsf{T} \operatorname{diag}\left(a^*\right) N \operatorname{diag}(b)\right) =
+  \operatorname{tr}\left(\overline{M}^\frac{1}{2} \overline{M}^\frac{1}{2} \operatorname{diag}\left(a^*\right) N^\frac{1}{2} N^\frac{1}{2} \operatorname{diag}(b)\right) =
+  \operatorname{tr}\left(\overline{M}^\frac{1}{2} \operatorname{diag}\left(a^*\right) N^\frac{1}{2} N^\frac{1}{2} \operatorname{diag}(b) \overline{M}^\frac{1}{2}\right)
+</math>
+Then, for <math>a = b</math>, this is written as <math>\operatorname{tr}\left(A^* A\right)</math> for <math>A = N^\frac{1}{2} \operatorname{diag}(a) \overline{M}^\frac{1}{2}</math> and thus is strictly positive for <math>A \neq 0</math>, which occurs if and only if <math>a \neq 0</math>.  This shows that <math>(M \circ N)</math> is a positive definite matrix.
 === Proof using Gaussian integration ===
@@ Line 18: / Line 26: @@
 ==== Case of ''M'' = ''N'' ====
-Let <math>X</math> be an <math>n</math>-dimensional centered [[Gaussian random variable]] with [[covariance]] <math>\langle X_i X_j \rangle = M_{ij}</math>.
+Let <math>X</math> be an <math>n</math>-dimensional centered [[Gaussian random variable]] with [[covariance]] <math>\langle X_i X_j \rangle = M_{ij}</math>. Then the covariance matrix of <math>X_i^2</math> and <math>X_j^2</math> is
-Then the covariance matrix of <math>X_i^2</math> and <math>X_j^2</math> is
-: <math>\operatorname{Cov}(X_i^2, X_j^2) = \langle X_i^2 X_j^2 \rangle - \langle X_i^2 \rangle \langle X_j^2 \rangle</math>
+: <math>\operatorname{Cov}\left(X_i^2, X_j^2\right) = \left\langle X_i^2 X_j^2 \right\rangle - \left\langle X_i^2 \right\rangle \left\langle X_j^2 \right\rangle</math>
-Using [[Isserlis' theorem|Wick's theorem]] to develop <math>\langle X_i^2 X_j^2 \rangle = 2 \langle X_i X_j \rangle^2 + \langle X_i^2 \rangle \langle X_j^2 \rangle</math> we have
+Using [[Isserlis' theorem|Wick's theorem]] to develop <math>\left\langle X_i^2 X_j^2 \right\rangle = 2 \left\langle X_i X_j \right\rangle^2 + \left\langle X_i^2 \right\rangle \left\langle X_j^2 \right\rangle</math> we have
-: <math>\operatorname{Cov}(X_i^2, X_j^2) = 2 \langle X_i X_j \rangle^2 = 2 M_{ij}^2</math>
+: <math>\operatorname{Cov}\left(X_i^2, X_j^2\right) = 2 \left\langle X_i X_j \right\rangle^2 = 2 M_{ij}^2</math>
 Since a covariance matrix is positive definite, this proves that the matrix with elements <math>M_{ij}^2</math> is a positive definite matrix.
@@ Line 31: / Line 38: @@
 ==== General case ====
-Let <math>X</math> and <math>Y</math> be <math>n</math>-dimensional centered [[Gaussian random variable]]s with [[covariance]]s <math>\langle X_i X_j \rangle = M_{ij}</math>, <math>\langle Y_i Y_j \rangle = N_{ij}</math> and independent from each other so that we have
+Let <math>X</math> and <math>Y</math> be <math>n</math>-dimensional centered [[Gaussian random variable]]s with [[covariance]]s <math>\left\langle X_i X_j \right\rangle = M_{ij}</math>, <math>\left\langle Y_i Y_j \right\rangle = N_{ij}</math> and independent from each other so that we have
-: <math>\langle X_i Y_j \rangle = 0</math> for any <math>i, j</math>
+: <math>\left\langle X_i Y_j \right\rangle = 0</math> for any <math>i, j</math>
 Then the covariance matrix of <math>X_i Y_i</math> and <math>X_j Y_j</math> is
-: <math>\operatorname{Cov}(X_i Y_i, X_j Y_j) = \langle X_i Y_i X_j Y_j \rangle - \langle X_i Y_i \rangle \langle X_j Y_j \rangle</math>
+: <math>\operatorname{Cov}\left(X_i Y_i, X_j Y_j\right) = \left\langle X_i Y_i X_j Y_j \right\rangle - \left\langle X_i Y_i \right\rangle \left\langle X_j Y_j \right\rangle</math>
-Using [[Wick's theorem]] to develop
+Using [[Isserlis' theorem|Wick's theorem]] to develop
-: <math>\langle X_i Y_i X_j Y_j \rangle = \langle X_i X_j \rangle \langle Y_i Y_j \rangle +  \langle X_i Y_i \rangle \langle X_j Y_j \rangle + \langle X_i Y_j \rangle \langle X_j Y_i \rangle</math>
+: <math>\left\langle X_i Y_i X_j Y_j \right\rangle = \left\langle X_i X_j \right\rangle \left\langle Y_i Y_j \right\rangle + \left\langle X_i Y_i \right\rangle \left\langle X_j Y_j \right\rangle + \left\langle X_i Y_j \right\rangle \left\langle X_j Y_i \right\rangle</math>
 and also using the independence of <math>X</math> and <math>Y</math>, we have
-: <math>\operatorname{Cov}(X_i Y_i, X_j Y_j) = \langle X_i X_j \rangle \langle Y_i Y_j \rangle = M_{ij} N_{ij}</math>
+: <math>\operatorname{Cov}\left(X_i Y_i, X_j Y_j\right) = \left\langle X_i X_j \right\rangle \left\langle Y_i Y_j \right\rangle = M_{ij} N_{ij}</math>
 Since a covariance matrix is positive definite, this proves that the matrix with elements <math>M_{ij} N_{ij}</math> is a positive definite matrix.
@@ Line 45: / Line 56: @@
 ==== Proof of positive semidefiniteness ====
-Let <math>M = \sum \mu_i m_i m_i^T</math> and <math>N = \sum \nu_i n_i n_i^T</math>.  Then
+Let <math>M = \sum \mu_i m_i m_i^\textsf{T}</math> and <math>N = \sum \nu_i n_i n_i^\textsf{T}</math>.  Then
-: <math>M \circ N = \sum_{ij} \mu_i \nu_j (m_i m_i^T) \circ (n_j n_j^T) = \sum_{ij} \mu_i \nu_j (m_i \circ n_j) (m_i \circ n_j)^T</math>
+: <math>M \circ N = \sum_{ij} \mu_i \nu_j \left(m_i m_i^\textsf{T}\right) \circ \left(n_j n_j^\textsf{T}\right) = \sum_{ij} \mu_i \nu_j \left(m_i \circ n_j\right) \left(m_i \circ n_j\right)^\textsf{T}</math>
-Each <math>(m_i \circ n_j) (m_i \circ n_j)^T</math> is positive semidefinite (but, except in the 1-dimensional case, not positive definite, since they are [[Rank (linear algebra)|rank]] 1 matrices).  Also, <math>\mu_i \nu_j > 0</math> thus the sum <math>M \circ N</math> is also positive semidefinite.
+Each <math>\left(m_i \circ n_j\right) \left(m_i \circ n_j\right)^\textsf{T}</math> is positive semidefinite (but, except in the 1-dimensional case, not positive definite, since they are [[rank (linear algebra)|rank]] 1 matrices).  Also, <math>\mu_i \nu_j > 0</math> thus the sum <math>M \circ N</math> is also positive semidefinite.
 ==== Proof of definiteness ====
-To show that the result is positive definite requires further proof.  We shall show that for any vector <math>a \neq 0</math>, we have <math>a^T (M \circ N) a > 0</math>.  Continuing as above, each <math>a^T (m_i \circ n_j) (m_i \circ n_j)^T a \ge 0</math>, so it remains to show that there exist <math>i</math> and <math>j</math> for which the inequality is strict.  For this we observe that
+To show that the result is positive definite requires even further proof.  We shall show that for any vector <math>a \neq 0</math>, we have <math>a^\textsf{T} (M \circ N) a > 0</math>.  Continuing as above, each <math>a^\textsf{T} \left(m_i \circ n_j\right) \left(m_i \circ n_j\right)^\textsf{T} a \ge 0</math>, so it remains to show that there exist <math>i</math> and <math>j</math> for which corresponding term above is nonzero. For this we observe that
+: <math>a^\textsf{T} (m_i \circ n_j) (m_i \circ n_j)^\textsf{T} a = \left(\sum_k m_{i,k} n_{j,k} a_k\right)^2</math>
+Since <math>N</math> is positive definite, there is a <math>j</math> for which <math>n_j \circ a \neq 0</math> (since otherwise <math>n_j^\textsf{T} a = \sum_k (n_j \circ a)_k = 0</math> for all <math>j</math>), and likewise since <math>M</math> is positive definite there exists an <math>i</math> for which <math>\sum_k m_{i,k} (n_j \circ a)_k = m_i^\textsf{T} (n_j \circ a) \neq 0.</math> However, this last sum is just <math>\sum_k m_{i,k} n_{j,k} a_k</math>. Thus its square is positive. This completes the proof.
-: <math>a^T (m_i \circ n_j) (m_i \circ n_j)^T a = \left(\sum_k m_{i,k} n_{j,k} a_k\right)^2</math>
-Since <math>N</math> is positive definite, there is a <math>j</math> for which <math>n_{j,k} a_k</math> is not 0 for all <math>k</math>, and then, since <math>M</math> is positive definite, there is an <math>i</math> for which <math>m_{i,k} n_{j,k} a_k</math> is not 0 for all <math>k</math>.  Then for this <math>i</math> and <math>j</math> we have <math>\left(\sum_k m_{i,k} n_{j,k} a_k\right)^2 > 0</math>. This completes the proof.
 == References ==
 {{reflist}}
@@ Line 66: / Line 76: @@
 [[Category:Linear algebra]]
 [[Category:Matrix theory]]
+[[Category:Issai Schur]]