Jump to content

Matrix calculus: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Reverted edits by 193.136.138.62 (talk) to last revision by Dlohcierekim (Report Mistake) (HG)
Replaced content with 'Ha! You didn't think of ''THAT'', smartass!'
Tag: blanking
Line 1: Line 1:
Ha! You didn't think of ''THAT'', smartass!
{{calculus|cTopic=Multivariable calculus}}
In [[mathematics]], '''matrix calculus''' is a specialized notation for doing [[multivariable calculus]], especially over spaces of [[matrix (mathematics)|matrices]], where it defines the '''matrix derivative'''. This notation is well-suited to describing systems of [[differential equation]]s, and taking [[derivative]]s of matrix-valued functions with respect to matrix variables. This notation is commonly used in [[statistics]] and [[engineering]], while the [[tensor index notation]] is preferred in [[physics]].

== Notice ==
This article uses another definition for vector and matrix calculus than the form often encountered within the field of [[estimation theory]] and [[pattern recognition]]. The resulting equations will therefore appear to be transposed when compared to the equations used in textbooks within these fields.

== Notation ==
Let ''M''(''n'',''m'') denote the space of [[real number|real]] ''n×m'' matrices with ''n'' rows and ''m'' columns, such matrices will be denoted using bold capital letters: '''A''', '''X''', '''Y''', etc. An element of ''M''(''n'',1), that is, a [[column vector]], is denoted with a boldface lowercase letter: '''a''', '''x''', '''y''', etc. An element of ''M''(1,1) is a scalar, denoted with lowercase italic typeface: ''a'', ''t'', ''x'', etc. '''X'''<sup>T</sup> denotes matrix [[transpose]], tr('''X''') is [[Trace (linear algebra)|trace]], and det('''X''') is the [[determinant]]. All functions are assumed to be of [[differentiability class]] ''C''<sup>1</sup> unless otherwise noted. Generally letters from first half of the alphabet (a, b, c, …) will be used to denote constants, and from the second half (t, x, y, …) to denote variables.

== Vector calculus ==
{{Main|Vector calculus}}
Because the space ''M''(''n'',1) is identified with the [[Euclidean space]] '''R'''<sup>''n''</sup> and ''M''(1,1) is identified with '''R''', the notations developed here can accommodate the usual operations of vector calculus.

<ul>
<li>The [[tangent vector]] to a curve '''x''' : '''R''' → '''R'''<sup>''n''</sup> is
:<math>\frac{\partial \mathbf{x}} {\partial t} =
\begin{bmatrix}
\frac{\partial x_1}{\partial t} \\
\vdots \\
\frac{\partial x_n}{\partial t} \\
\end{bmatrix}.
</math>
</li>

<li>The [[gradient]] of a scalar function ''f'' : '''R'''<sup>''n''</sup> → '''R'''
:<math>\frac{\partial f}{\partial \mathbf{x}} =
\begin{bmatrix}
\frac{\partial f}{\partial x_1} & \cdots & \frac{\partial f}{\partial x_n} \\
\end{bmatrix}.
</math>
The [[directional derivative]] of ''f'' in the direction of '''v''' is then
:<math>\nabla_\mathbf{v} f = \frac{\partial f}{\partial \mathbf{x}}\mathbf{v}.</math>
</li>

<li>The [[pushforward (differential)|pushforward or differential]] of a function '''f''' : '''R'''<sup>''m''</sup> → '''R'''<sup>''n''</sup> is described by the [[Jacobian matrix]]
:<math>
\frac{\partial \mathbf{f}}{\partial \mathbf{x}} =
\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_m}\\
\vdots & \ddots & \vdots\\
\frac{\partial f_n}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_m}\\
\end{bmatrix}.
</math>
The pushforward along '''f''' of a vector '''v''' in '''R'''<sup>''m''</sup> is
:<math>d\,\mathbf{f}(\mathbf{v}) = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} \mathbf{v}.</math>
</li>
</ul>

== Matrix calculus ==

{{Disputed-section|Disputed information: Matrix derivative|date=July 2009}}
For the purposes of defining derivatives of simple functions, not much changes with matrix spaces; the space of ''n''×''m'' matrices is [[isomorphic]] to the [[vector space]] '''R'''<sup>''nm''</sup>.{{Dubious|Isomorphic spaces|date=December 2009}} The three derivatives familiar from vector calculus have close analogues here, though beware the complications that arise in the identities below.

<ul>
<li>The tangent vector of a curve '''F''' : '''R''' → ''M''(''n'',''m'')
:<math>
\frac{\partial \mathbf{F}}{\partial t} =
\begin{bmatrix}
\frac{\partial F_{1,1}}{\partial t} & \cdots & \frac{\partial F_{1,m}}{\partial t}\\
\vdots & \ddots & \vdots\\
\frac{\partial F_{n,1}}{\partial t} & \cdots & \frac{\partial F_{n,m}}{\partial t}\\
\end{bmatrix}.
</math>
</li>

<li>The gradient of a scalar function ''f'' : ''M''(''n'',''m'') → '''R'''
:<math>
\frac{\partial f}{\partial \mathbf{X}} =
\begin{bmatrix}
\frac{\partial f}{\partial X_{1,1}} & \cdots & \frac{\partial f}{\partial X_{n,1}}\\
\vdots & \ddots & \vdots\\
\frac{\partial f}{\partial X_{1,m}} & \cdots & \frac{\partial f}{\partial X_{n,m}}\\
\end{bmatrix}.
</math>
Notice that the indexing of the gradient with respect to '''X''' is transposed as compared with the indexing of '''X'''. The directional derivative of ''f'' in the direction of matrix '''Y''' is given by
:<math>\nabla_\mathbf{Y} f = \operatorname{tr} \left(\frac{\partial f}{\partial \mathbf{X}} \mathbf{Y}\right).</math>

</li>

<li>The differential or the matrix derivative of a function '''F''' : ''M''(''n'',''m'') → ''M''(''p'',''q'') is an element of ''M''(''p'',''q'') {{dirprod}} ''M''(''m'',''n''), a [[Tensor#Tensor rank|fourth-rank]] [[tensor]] (the reversal of ''m'' and ''n'' here indicates the [[dual space]] of ''M''(''n'',''m'')). In short it is an ''m''×''n'' matrix each of whose entries is a ''p''×''q'' matrix.{{Citation needed|date=July 2009}}
:<math>\frac{\partial\mathbf{F}} {\partial\mathbf{X}}=
\begin{bmatrix}
\frac{\partial\mathbf{F}}{\partial X_{1,1}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,1}}\\
\vdots & \ddots & \vdots\\
\frac{\partial\mathbf{F}}{\partial X_{1,m}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,m}}\\
\end{bmatrix},
</math>
and note that each ∂'''F'''/∂''X''<sub>''i'',''j''</sub> is a ''p''×''q'' matrix defined as above. Note also that this matrix has its indexing transposed; ''m'' rows and ''n'' columns. The pushforward along '''F''' of an ''n''×''m'' matrix '''Y''' in ''M''(''n'',''m'') is then
:<math>d\mathbf{F}(\mathbf{Y}) = \operatorname{tr}\left(\frac{\partial\mathbf{F}} {\partial\mathbf{X}}\mathbf{Y}\right),</math> as formal block matrices.
Note that this definition encompasses all of the preceding definitions as special cases.
</li>
</ul>

According to Jan R. Magnus and Heinz Neudecker, the following notations are both unsuitable, as the determinants of the resulting matrices would have "no interpretation" and "a useful chain rule does not exist" if these notations are being used:<ref name="Magnus and Neudecker 1999">{{cite book|last1=Magnus|first1=Jan R.|last2=Neudecker|first2=Heinz|title=Matrix Differential Calculus|publisher=Wiley|series=Wiley Series in Probability and Statistics|edition=revised|year=1999 (1988)|pages=171–173}}</ref>
#<math>\frac{\partial\mathbf{F}} {\partial\mathbf{X}}=
\begin{bmatrix}
\frac{\partial\mathbf F_{1,1}}{\partial \mathbf X} & \cdots & \frac{\partial \mathbf F_{n,1}}{\partial \mathbf X}\\
\vdots & \ddots & \vdots\\
\frac{\partial\mathbf F_{1,m}}{\partial \mathbf X} & \cdots & \frac{\partial \mathbf F_{n,m}}{\partial \mathbf X}\\
\end{bmatrix}
</math>
#<math>\frac{\partial\mathbf{F}} {\partial\mathbf{X}}=
\begin{bmatrix}
\frac{\partial\mathbf F}{\partial \mathbf X_{1,1}} & \cdots & \frac{\partial \mathbf F}{\partial \mathbf X_{n,1}}\\
\vdots & \ddots & \vdots\\
\frac{\partial\mathbf F}{\partial \mathbf X_{1,m}} & \cdots & \frac{\partial \mathbf F}{\partial \mathbf X_{n,m}}\\
\end{bmatrix}
</math>
The [[Jacobian matrix]], according to Magnus and Neudecker,<ref name="Magnus and Neudecker 1999"/> is
:<math>\mathrm D\, \mathbf F\left(\mathbf X\right) = \frac{\partial\, \mathrm{vec}\ \mathbf F\left(\mathbf X\right)}{\partial\left(\mathrm{vec}\ \mathbf X\right)^{\mathrm T}}.
</math>{{Contradiction-inline|date=December 2009}}

== Identities ==

{{Disputed-section|Proposed "Identities" section|date=July 2009}}

Note that matrix multiplication is not [[commutative]], so in these identities, the order must not be changed.

<ul>
<li> '''[[Chain rule]]:''' If '''Z''' is a function of '''Y''' which in turn is a function of '''X''', and these are all column vectors, then{{Citation needed|date=July 2009}}
:<math>
\frac{\partial \mathbf{Z}} {\partial \mathbf{X}} = \frac{\partial \mathbf{Z}} {\partial \mathbf{Y}} \frac{\partial \mathbf{Y}} {\partial \mathbf{X}}</math>
</li>

<li>'''[[Product rule]]:'''In all cases where the derivatives do not involve tensor products (for example, '''Y''' has more than one row and '''X''' has more than one column),{{Citation needed|date=July 2009}}
:<math>
\frac{\partial (\mathbf{Y}\mathbf{Z})}{\partial \mathbf{X}} = \frac{\partial\mathbf{Y}}{\partial\mathbf{X}}{\mathbf{Z}} + \mathbf{Y}\frac{\partial\mathbf{Z}}{\partial \mathbf{X}}
</math>
</li>
</ul>

== Examples ==

{{Disputed-section|Proposed "Examples" section|date=July 2009}}

=== Derivative of linear functions ===
This section lists some commonly used vector derivative formulas for linear equations evaluating to a vector.

:<math>
\frac{\partial \; \textbf{a}^T\textbf{x}}{\partial \; \textbf{x}} = \frac{\partial \; \textbf{x}^T\textbf{a}}{\partial \; \textbf{x}} = \textbf{a}^T
</math>

:<math>
\frac{\partial \; \textbf{A}\textbf{x}}{\partial \; \textbf{x}} = \frac{\partial \; \textbf{x}^T\textbf{A}}{\partial \; \textbf{x}^T} = \textbf{A}
</math>

=== Derivative of quadratic functions ===
This section lists some commonly used vector derivative formulas for quadratic matrix equations evaluating to a scalar.

:<math>
\frac{\partial \; \textbf{x}^T \textbf{A}\textbf{x}}{\partial \; \textbf{x}} =
\textbf{x}^T(\textbf{A}^T + \textbf{A}) = \textbf{x}^T \textbf{A}^T + \textbf{x}^T \textbf{A} </math>

:<math>
\frac{\partial \; (\textbf{A}\textbf{x} + \textbf{b})^T \textbf{C} (\textbf{D}\textbf{x} + \textbf{e}) }{\partial \; \textbf{x}} =
(\textbf{D}\textbf{x} + \textbf{e})^T \textbf{C}^T \textbf{A} + (\textbf{A}\textbf{x} + \textbf{b})^T \textbf{C} \textbf{D}</math>

Related to this is the derivative of the [[Euclidean norm]]:

:<math> \frac{\partial \; \|\mathbf{x}-\mathbf{a}\|}{\partial \; \textbf{x}} =
\frac{(\mathbf{x}-\mathbf{a})^T}{\|\mathbf{x}-\mathbf{a}\|}. </math>

=== Derivative of matrix traces ===
This section shows examples of matrix differentiation of common [[Trace (matrix)|trace]] equations.
:<math> \frac{\partial \; \operatorname{tr}( \textbf{A} \textbf{X} \textbf{B})}{\partial \; \textbf{X}} = \frac{\partial \; \operatorname{tr}( \textbf{B}^T \textbf{X}^T \textbf{A}^T)}{\partial \; \textbf{X}} = \textbf{B} \textbf{A}
</math>

:<math> \frac{\partial \; \operatorname{tr}( \textbf{A} \textbf{X} \textbf{B} \textbf{X}^T \textbf{C}) }{\partial \; \textbf{X}} = \textbf{B} \textbf{X}^T \textbf{C} \textbf{A} + \textbf{B}^T \textbf{X}^T \textbf{A}^T \textbf{C}^T </math>

=== Derivative of matrix determinant ===
: <math> \frac{\partial \det\mathbf{X}}{\partial \mathbf{X}}= \operatorname{adj}\,\mathbf{X}= \det\mathbf{X}\cdot \mathbf{X}^{-1}.</math>

== Relation to other derivatives ==

The matrix derivative is a convenient notation for keeping track of partial derivatives for doing calculations. The [[Fréchet derivative]] is the standard way in the setting of [[functional analysis]] to take derivatives with respect to vectors. In the case that a matrix function of a matrix is Fréchet differentiable, the two derivatives will agree up to translation of notations. As is the case in general for [[partial derivative]]s, some formulae may extend under weaker analytic conditions than the existence of the derivative as approximating linear mapping.

== Usages ==

Matrix calculus is used for deriving optimal stochastic estimators, often involving the use of [[Lagrange multipliers]]. This includes the derivation of:
* [[Kalman filter]]
* [[Wiener filter]]
* [[Expectation-maximization algorithm#Example: Gaussian mixture|Expectation-maximization algorithm for Gaussian mixture]]

== Alternatives ==

The [[tensor index notation]] with its [[Einstein summation]] convention is very similar to the matrix calculus, except one writes only a single component at a time. It has the advantage that one can easily manipulate arbitrarily high rank tensors, whereas tensors of rank higher than two are quite unwieldy with matrix notation. Note that a matrix can be considered simply a tensor of rank two.

== See also ==
{{portal|Mathematics|Nuvola_apps_edu_mathematics_blue-p.svg}}
* [[Derivative (generalizations)]]

== Notes ==

<references/>

== External links ==
* [http://www.colorado.edu/engineering/cas/courses.d/IFEM.d/IFEM.AppD.d/IFEM.AppD.pdf Matrix Calculus] appendix from ''Introduction to Finite Element Methods'' book on University of Colorado at Boulder. Uses the Hessian (transpose to Jacobian) definition of vector and matrix derivatives.
* [http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html Matrix calculus] Matrix Reference Manual , Imperial College London.
* [http://matrixcookbook.com The Matrix Cookbook], with a ''derivatives'' chapter. Uses the Hessian definition.

{{DEFAULTSORT:Matrix Calculus}}
[[Category:Matrix theory]]
[[Category:Linear algebra]]
[[Category:Vector calculus]]
[[Category:Multivariable calculus]]

[[ar:حسبان المصفوفات]]
[[fa:حساب ماتریس‌ها]]
[[id:Kalkulus matriks]]
[[pt:Cálculo matricial]]

Revision as of 16:39, 18 March 2010

Ha! You didn't think of THAT, smartass!