Jump to content

Schur product theorem: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Proof: \frac
No edit summary
 
(19 intermediate revisions by 12 users not shown)
Line 1: Line 1:
In [[mathematics]], particularly in [[linear algebra]], the '''Schur product theorem''' states that the [[Hadamard product (matrices)|Hadamard product]] of two [[positive definite matrices]] is also a positive definite matrix. The result is named after [[Issai Schur]]<ref name="Sch1911">{{Cite journal | doi = 10.1515/crll.1911.140.1 | title = Bemerkungen zur Theorie der beschränkten Bilinearformen mit unendlich vielen Veränderlichen | journal = Journal für die reine und angewandte Mathematik (Crelle's Journal) | volume = 1911 | issue = 140 | pages = 1–28| year = 1911 | pmid = | pmc = }}</ref> (Schur 1911, p.&nbsp;14, Theorem VII) (note that Schur signed as J. Schur in ''Journal für die reine und angewandte Mathematik''.<ref>{{Cite journal | editor1-last = Zhang | editor1-first = Fuzhen | title = The Schur Complement and Its Applications | doi = 10.1007/b105056 | series = Numerical Methods and Algorithms | volume = 4 | year = 2005 | isbn = 0-387-24271-6 | pmid = | pmc = }}, page 9, Ch. 0.6 ''Publication under J. Schur''</ref><ref>{{Cite journal | last1 = Ledermann | first1 = W. | title = Issai Schur and His School in Berlin | doi = 10.1112/blms/15.2.97 | journal = Bulletin of the London Mathematical Society | volume = 15 | issue = 2 | pages = 97–106 | year = 1983 | pmid = | pmc = }}</ref>)
In [[mathematics]], particularly in [[linear algebra]], the '''Schur product theorem''' states that the [[Hadamard product (matrices)|Hadamard product]] of two [[positive definite matrices]] is also a positive definite matrix.
The result is named after [[Issai Schur]]<ref name="Sch1911">{{Cite journal | doi = 10.1515/crll.1911.140.1 | title = Bemerkungen zur Theorie der beschränkten Bilinearformen mit unendlich vielen Veränderlichen | journal = Journal für die reine und angewandte Mathematik | volume = 1911 | issue = 140 | pages = 1–28| year = 1911 | last1 = Schur | first1 = J. | s2cid = 120411177 }}</ref> (Schur 1911, p.&nbsp;14, Theorem VII) (note that Schur signed as J. Schur in ''Journal für die reine und angewandte Mathematik''.<ref>{{Cite book | editor1-last = Zhang | editor1-first = Fuzhen | title = The Schur Complement and Its Applications | doi = 10.1007/b105056 | series = Numerical Methods and Algorithms | volume = 4 | year = 2005 | isbn = 0-387-24271-6 }}, page 9, Ch. 0.6 ''Publication under J. Schur''</ref><ref>{{Cite journal | last1 = Ledermann | first1 = W. | title = Issai Schur and His School in Berlin | doi = 10.1112/blms/15.2.97 | journal = Bulletin of the London Mathematical Society | volume = 15 | issue = 2 | pages = 97–106 | year = 1983 }}</ref>)

The converse of the theorem holds in the following sense: if <math>M</math> is a symmetric matrix and the Hadamard product <math>M \circ N</math> is positive definite for all positive definite matrices <math>N</math>, then <math>M</math> itself is positive definite.


== Proof ==
== Proof ==
Line 6: Line 9:


For any matrices <math>M</math> and <math>N</math>, the Hadamard product <math>M \circ N</math> considered as a bilinear form acts on vectors <math>a, b</math> as
For any matrices <math>M</math> and <math>N</math>, the Hadamard product <math>M \circ N</math> considered as a bilinear form acts on vectors <math>a, b</math> as
: <math>a^* (M \circ N) b = \operatorname{tr}(M^T \operatorname{diag}(a^*) N \operatorname{diag}(b))</math>
: <math>a^* (M \circ N) b = \operatorname{tr}\left(M^\textsf{T} \operatorname{diag}\left(a^*\right) N \operatorname{diag}(b)\right)</math>
where <math>\operatorname{tr}</math> is the matrix [[Trace (linear algebra)|trace]] and <math>\operatorname{diag}(a)</math> is the [[diagonal matrix]] having as diagonal entries the elements of <math>a</math>.


Suppose <math>M</math> and <math>N</math> are positive definite, and so [[Hermitian matrix|Hermitian]]. We can consider their square-roots <math>M^{\frac 1 2}</math> and <math>N^{\frac 1 2}</math>, which are also Hermitian, and write
where <math>\operatorname{tr}</math> is the matrix [[trace (linear algebra)|trace]] and <math>\operatorname{diag}(a)</math> is the [[diagonal matrix]] having as diagonal entries the elements of <math>a</math>.

: <math>\operatorname{tr}(M^T \operatorname{diag}(a^*) N \operatorname{diag}(b)) = \operatorname{tr}(\overline{M}^{\frac 1 2} \overline{M}^{\frac 1 2} \operatorname{diag}(a^*) N^{\frac 1 2} N^{\frac 1 2} \operatorname{diag}(b)) = \operatorname{tr}(\overline{M}^{\frac 1 2} \operatorname{diag}(a^*) N^{\frac 1 2} N^{\frac 1 2} \operatorname{diag}(b) \overline{M}^{\frac 1 2})</math>
Then, for <math>a=b</math>, this is written as <math>\operatorname{tr}(A^* A)</math> for <math>A = N^{\frac 1 2} \operatorname{diag}(a) \overline{M}^{\frac 1 2}</math>
Suppose <math>M</math> and <math>N</math> are positive definite, and so [[Hermitian matrix|Hermitian]]. We can consider their square-roots <math>M^\frac{1}{2}</math> and <math>N^\frac{1}{2}</math>, which are also Hermitian, and write
: <math>
and thus is strictly positive for <math>A\neq 0</math>, which occurs if and only if <math>a \neq 0</math>. This shows that <math>(M \circ N)</math> is a positive definite matrix.
\operatorname{tr}\left(M^\textsf{T} \operatorname{diag}\left(a^*\right) N \operatorname{diag}(b)\right) =
\operatorname{tr}\left(\overline{M}^\frac{1}{2} \overline{M}^\frac{1}{2} \operatorname{diag}\left(a^*\right) N^\frac{1}{2} N^\frac{1}{2} \operatorname{diag}(b)\right) =
\operatorname{tr}\left(\overline{M}^\frac{1}{2} \operatorname{diag}\left(a^*\right) N^\frac{1}{2} N^\frac{1}{2} \operatorname{diag}(b) \overline{M}^\frac{1}{2}\right)
</math>

Then, for <math>a = b</math>, this is written as <math>\operatorname{tr}\left(A^* A\right)</math> for <math>A = N^\frac{1}{2} \operatorname{diag}(a) \overline{M}^\frac{1}{2}</math> and thus is strictly positive for <math>A \neq 0</math>, which occurs if and only if <math>a \neq 0</math>. This shows that <math>(M \circ N)</math> is a positive definite matrix.


=== Proof using Gaussian integration ===
=== Proof using Gaussian integration ===
Line 18: Line 26:
==== Case of ''M'' = ''N'' ====
==== Case of ''M'' = ''N'' ====


Let <math>X</math> be an <math>n</math>-dimensional centered [[Gaussian random variable]] with [[covariance]] <math>\langle X_i X_j \rangle = M_{ij}</math>.
Let <math>X</math> be an <math>n</math>-dimensional centered [[Gaussian random variable]] with [[covariance]] <math>\langle X_i X_j \rangle = M_{ij}</math>. Then the covariance matrix of <math>X_i^2</math> and <math>X_j^2</math> is
Then the covariance matrix of <math>X_i^2</math> and <math>X_j^2</math> is


: <math>\operatorname{Cov}(X_i^2, X_j^2) = \langle X_i^2 X_j^2 \rangle - \langle X_i^2 \rangle \langle X_j^2 \rangle</math>
: <math>\operatorname{Cov}\left(X_i^2, X_j^2\right) = \left\langle X_i^2 X_j^2 \right\rangle - \left\langle X_i^2 \right\rangle \left\langle X_j^2 \right\rangle</math>


Using [[Isserlis' theorem|Wick's theorem]] to develop <math>\langle X_i^2 X_j^2 \rangle = 2 \langle X_i X_j \rangle^2 + \langle X_i^2 \rangle \langle X_j^2 \rangle</math> we have
Using [[Isserlis' theorem|Wick's theorem]] to develop <math>\left\langle X_i^2 X_j^2 \right\rangle = 2 \left\langle X_i X_j \right\rangle^2 + \left\langle X_i^2 \right\rangle \left\langle X_j^2 \right\rangle</math> we have


: <math>\operatorname{Cov}(X_i^2, X_j^2) = 2 \langle X_i X_j \rangle^2 = 2 M_{ij}^2</math>
: <math>\operatorname{Cov}\left(X_i^2, X_j^2\right) = 2 \left\langle X_i X_j \right\rangle^2 = 2 M_{ij}^2</math>


Since a covariance matrix is positive definite, this proves that the matrix with elements <math>M_{ij}^2</math> is a positive definite matrix.
Since a covariance matrix is positive definite, this proves that the matrix with elements <math>M_{ij}^2</math> is a positive definite matrix.
Line 31: Line 38:
==== General case ====
==== General case ====


Let <math>X</math> and <math>Y</math> be <math>n</math>-dimensional centered [[Gaussian random variable]]s with [[covariance]]s <math>\langle X_i X_j \rangle = M_{ij}</math>, <math>\langle Y_i Y_j \rangle = N_{ij}</math> and independent from each other so that we have
Let <math>X</math> and <math>Y</math> be <math>n</math>-dimensional centered [[Gaussian random variable]]s with [[covariance]]s <math>\left\langle X_i X_j \right\rangle = M_{ij}</math>, <math>\left\langle Y_i Y_j \right\rangle = N_{ij}</math> and independent from each other so that we have
: <math>\langle X_i Y_j \rangle = 0</math> for any <math>i, j</math>
: <math>\left\langle X_i Y_j \right\rangle = 0</math> for any <math>i, j</math>

Then the covariance matrix of <math>X_i Y_i</math> and <math>X_j Y_j</math> is
Then the covariance matrix of <math>X_i Y_i</math> and <math>X_j Y_j</math> is
: <math>\operatorname{Cov}(X_i Y_i, X_j Y_j) = \langle X_i Y_i X_j Y_j \rangle - \langle X_i Y_i \rangle \langle X_j Y_j \rangle</math>
: <math>\operatorname{Cov}\left(X_i Y_i, X_j Y_j\right) = \left\langle X_i Y_i X_j Y_j \right\rangle - \left\langle X_i Y_i \right\rangle \left\langle X_j Y_j \right\rangle</math>

Using [[Wick's theorem]] to develop
Using [[Isserlis' theorem|Wick's theorem]] to develop
: <math>\langle X_i Y_i X_j Y_j \rangle = \langle X_i X_j \rangle \langle Y_i Y_j \rangle + \langle X_i Y_i \rangle \langle X_j Y_j \rangle + \langle X_i Y_j \rangle \langle X_j Y_i \rangle</math>
: <math>\left\langle X_i Y_i X_j Y_j \right\rangle = \left\langle X_i X_j \right\rangle \left\langle Y_i Y_j \right\rangle + \left\langle X_i Y_i \right\rangle \left\langle X_j Y_j \right\rangle + \left\langle X_i Y_j \right\rangle \left\langle X_j Y_i \right\rangle</math>

and also using the independence of <math>X</math> and <math>Y</math>, we have
and also using the independence of <math>X</math> and <math>Y</math>, we have
: <math>\operatorname{Cov}(X_i Y_i, X_j Y_j) = \langle X_i X_j \rangle \langle Y_i Y_j \rangle = M_{ij} N_{ij}</math>
: <math>\operatorname{Cov}\left(X_i Y_i, X_j Y_j\right) = \left\langle X_i X_j \right\rangle \left\langle Y_i Y_j \right\rangle = M_{ij} N_{ij}</math>

Since a covariance matrix is positive definite, this proves that the matrix with elements <math>M_{ij} N_{ij}</math> is a positive definite matrix.
Since a covariance matrix is positive definite, this proves that the matrix with elements <math>M_{ij} N_{ij}</math> is a positive definite matrix.


Line 45: Line 56:
==== Proof of positive semidefiniteness ====
==== Proof of positive semidefiniteness ====


Let <math>M = \sum \mu_i m_i m_i^T</math> and <math>N = \sum \nu_i n_i n_i^T</math>. Then
Let <math>M = \sum \mu_i m_i m_i^\textsf{T}</math> and <math>N = \sum \nu_i n_i n_i^\textsf{T}</math>. Then
: <math>M \circ N = \sum_{ij} \mu_i \nu_j (m_i m_i^T) \circ (n_j n_j^T) = \sum_{ij} \mu_i \nu_j (m_i \circ n_j) (m_i \circ n_j)^T</math>
: <math>M \circ N = \sum_{ij} \mu_i \nu_j \left(m_i m_i^\textsf{T}\right) \circ \left(n_j n_j^\textsf{T}\right) = \sum_{ij} \mu_i \nu_j \left(m_i \circ n_j\right) \left(m_i \circ n_j\right)^\textsf{T}</math>

Each <math>(m_i \circ n_j) (m_i \circ n_j)^T</math> is positive semidefinite (but, except in the 1-dimensional case, not positive definite, since they are [[Rank (linear algebra)|rank]] 1 matrices). Also, <math>\mu_i \nu_j > 0</math> thus the sum <math>M \circ N</math> is also positive semidefinite.
Each <math>\left(m_i \circ n_j\right) \left(m_i \circ n_j\right)^\textsf{T}</math> is positive semidefinite (but, except in the 1-dimensional case, not positive definite, since they are [[rank (linear algebra)|rank]] 1 matrices). Also, <math>\mu_i \nu_j > 0</math> thus the sum <math>M \circ N</math> is also positive semidefinite.


==== Proof of definiteness ====
==== Proof of definiteness ====


To show that the result is positive definite requires further proof. We shall show that for any vector <math>a \neq 0</math>, we have <math>a^T (M \circ N) a > 0</math>. Continuing as above, each <math>a^T (m_i \circ n_j) (m_i \circ n_j)^T a \ge 0</math>, so it remains to show that there exist <math>i</math> and <math>j</math> for which the inequality is strict. For this we observe that
To show that the result is positive definite requires even further proof. We shall show that for any vector <math>a \neq 0</math>, we have <math>a^\textsf{T} (M \circ N) a > 0</math>. Continuing as above, each <math>a^\textsf{T} \left(m_i \circ n_j\right) \left(m_i \circ n_j\right)^\textsf{T} a \ge 0</math>, so it remains to show that there exist <math>i</math> and <math>j</math> for which corresponding term above is nonzero. For this we observe that
: <math>a^\textsf{T} (m_i \circ n_j) (m_i \circ n_j)^\textsf{T} a = \left(\sum_k m_{i,k} n_{j,k} a_k\right)^2</math>


Since <math>N</math> is positive definite, there is a <math>j</math> for which <math>n_j \circ a \neq 0</math> (since otherwise <math>n_j^\textsf{T} a = \sum_k (n_j \circ a)_k = 0</math> for all <math>j</math>), and likewise since <math>M</math> is positive definite there exists an <math>i</math> for which <math>\sum_k m_{i,k} (n_j \circ a)_k = m_i^\textsf{T} (n_j \circ a) \neq 0.</math> However, this last sum is just <math>\sum_k m_{i,k} n_{j,k} a_k</math>. Thus its square is positive. This completes the proof.
: <math>a^T (m_i \circ n_j) (m_i \circ n_j)^T a = \left(\sum_k m_{i,k} n_{j,k} a_k\right)^2</math>

Since <math>N</math> is positive definite, there is a <math>j</math> for which <math>n_{j,k} a_k</math> is not 0 for all <math>k</math>, and then, since <math>M</math> is positive definite, there is an <math>i</math> for which <math>m_{i,k} n_{j,k} a_k</math> is not 0 for all <math>k</math>. Then for this <math>i</math> and <math>j</math> we have <math>\left(\sum_k m_{i,k} n_{j,k} a_k\right)^2 > 0</math>. This completes the proof.


== References ==
== References ==

{{reflist}}
{{reflist}}


Line 66: Line 76:
[[Category:Linear algebra]]
[[Category:Linear algebra]]
[[Category:Matrix theory]]
[[Category:Matrix theory]]
[[Category:Issai Schur]]

Latest revision as of 19:09, 23 November 2024

In mathematics, particularly in linear algebra, the Schur product theorem states that the Hadamard product of two positive definite matrices is also a positive definite matrix. The result is named after Issai Schur[1] (Schur 1911, p. 14, Theorem VII) (note that Schur signed as J. Schur in Journal für die reine und angewandte Mathematik.[2][3])

The converse of the theorem holds in the following sense: if is a symmetric matrix and the Hadamard product is positive definite for all positive definite matrices , then itself is positive definite.

Proof

[edit]

Proof using the trace formula

[edit]

For any matrices and , the Hadamard product considered as a bilinear form acts on vectors as

where is the matrix trace and is the diagonal matrix having as diagonal entries the elements of .

Suppose and are positive definite, and so Hermitian. We can consider their square-roots and , which are also Hermitian, and write

Then, for , this is written as for and thus is strictly positive for , which occurs if and only if . This shows that is a positive definite matrix.

Proof using Gaussian integration

[edit]

Case of M = N

[edit]

Let be an -dimensional centered Gaussian random variable with covariance . Then the covariance matrix of and is

Using Wick's theorem to develop we have

Since a covariance matrix is positive definite, this proves that the matrix with elements is a positive definite matrix.

General case

[edit]

Let and be -dimensional centered Gaussian random variables with covariances , and independent from each other so that we have

for any

Then the covariance matrix of and is

Using Wick's theorem to develop

and also using the independence of and , we have

Since a covariance matrix is positive definite, this proves that the matrix with elements is a positive definite matrix.

Proof using eigendecomposition

[edit]

Proof of positive semidefiniteness

[edit]

Let and . Then

Each is positive semidefinite (but, except in the 1-dimensional case, not positive definite, since they are rank 1 matrices). Also, thus the sum is also positive semidefinite.

Proof of definiteness

[edit]

To show that the result is positive definite requires even further proof. We shall show that for any vector , we have . Continuing as above, each , so it remains to show that there exist and for which corresponding term above is nonzero. For this we observe that

Since is positive definite, there is a for which (since otherwise for all ), and likewise since is positive definite there exists an for which However, this last sum is just . Thus its square is positive. This completes the proof.

References

[edit]
  1. ^ Schur, J. (1911). "Bemerkungen zur Theorie der beschränkten Bilinearformen mit unendlich vielen Veränderlichen". Journal für die reine und angewandte Mathematik. 1911 (140): 1–28. doi:10.1515/crll.1911.140.1. S2CID 120411177.
  2. ^ Zhang, Fuzhen, ed. (2005). The Schur Complement and Its Applications. Numerical Methods and Algorithms. Vol. 4. doi:10.1007/b105056. ISBN 0-387-24271-6., page 9, Ch. 0.6 Publication under J. Schur
  3. ^ Ledermann, W. (1983). "Issai Schur and His School in Berlin". Bulletin of the London Mathematical Society. 15 (2): 97–106. doi:10.1112/blms/15.2.97.
[edit]