Matrix Calculus

Go to: Introduction, Notation, Index



Contents of Calculus Section

Notation

Derivatives

In the main part of this page we express results in terms of differentials rather than derivatives for two reasons: they avoid notational disagreements and they cope easily with the complex case. In most cases however, the differentials have been written in the form dY: = dY/dX dX: so that the corresponding derivative may be easily extracted.

Derivatives with respect to a real matrix

If X is p#q and Y is m#n, then dY: = dY/dX dX: where the derivative dY/dX is a large mn#pq matrix. If X and/or Y are column vectors or scalars, then the vectorization operator : has no effect and may be omitted. dY/dX is also called the Jacobian Matrix of Y: with respect to X: and det(dY/dX) is the corresponding Jacobian. The Jacobian occurs when changing variables in an integration: Integral(f(Y)dY:)=Integral(f(Y(X)) det(dY/dX) dX:).

Although they do not generalise so well, other authors use alternative notations for the cases when X and Y are both vectors or when one is a scalar. In particular:

Derivatives with respect to a complex matrix

If X is complex then dY: = dY/dX dX: can only be generally true iff Y(X) is an analytic function. This normally implies that Y(X) does not depend explicitly on XC or XH.

Even for non-analytic functions we can treat X and XC (with XH=(XC)T) as distinct variables and write uniquely dY: = ∂Y/∂X dX: + ∂Y/∂XC dXC: provided that Y is analytic with respect to X and XC individually (or equivalently with respect to XR and XI individually).  ∂Y/∂X is the Generalized Complex Derivative and ∂Y/∂XC is the Complex Conjugate Derivative [R.4, R.9]; their properties are studied in Wirtinger Calculus.

We define the generalized derivatives in terms of partial derivatives with respect to XR and XI:

We have the following relationships for both analytic and non-analytic functions Y(X):

Complex Constrained Minimization

Suppose f(X) is a scalar real function of a complex matrix (or vector), X, and G(X) is a complex-valued matrix (or vector or scalar) function of X. To minimize f(X) subject to G(X)=0, we use complex Lagrange multipliers and minimize f(X)+tr(KHG(X))+tr(KTG(X)C) subject to G(X)=0. Hence we solve ∂f/∂X+∂tr(KHG)/X+tr(KTGC)/X = 0T subject to G(X)=0. If g(X) is a vector, this becomes  ∂f/∂X+kHg/∂X+kTgC/∂X = 0T . If g(X) is a scalar, this becomes  ∂f/∂X+kCg/∂x+kgC/∂x = 0T .

Complex Gradient Vector

If f(X) is a real function of a complex matrix (or vector), X, then ∂f/∂XC= (∂f/∂X)C and we can define the complex-valued column vector grad(f(X)) = 2 (∂f/∂X)H = (∂f/XR+j ∂f/XI)T as the Complex Gradient Vector [R.9] with the properties listed below. If we use <-> to represent the vector mapping associated with the   Complex-to-Real isomporphism, and  X[m#n]: <-> y[2mn] where y is real, then grad(f(X)) <->  grad(f(y)) where the latter is the conventional grad function from vector calculus.

Basic Properties

Differentials of Linear Functions

Differentials of Quadratic Products

Differentials of Cubic Products

Differentials of Inverses

Differentials of Trace

Note: matrix dimensions must result in an n*n argument for tr().

Trace Minimization

In the following expressions M# denotes the inverse of M or, if M is singular, any generalized inverse (including the pseudoinverse).

Differentials of Determinant

Note: matrix dimensions must result in an n#n argument for det(). Some of the expressions below involve inverses: these forms apply only if the quantity being inverted is square and non-singular; alternative forms involving the adjoint, ADJ(), do not have the non-singular requirement.

Jacobian

 dY/dX is called the Jacobian Matrix of Y: with respect to X: and JX(Y)=det(dY/dX) is the corresponding Jacobian. The Jacobian occurs when changing variables in an integration: Integral(f(Y)dY:)=Integral(f(Y(X)) det(dY/dX) dX:).

Hessian matrix

If f is a real function of x then the Hermitian matrix Hx  f = (d/dx (df/dx)H)T  is the Hessian matrix of f(x). A value of x for which grad f(x) = 0 corresponds to a minimum, maximum or saddle point according to whether Hx f is positive definite, negative definite or indefinite.


This page is part of The Matrix Reference Manual. Copyright © 1998-2022 Mike Brookes, Imperial College, London, UK. See the file gfl.html for copying instructions. Please send any comments or suggestions to "mike.brookes" at "imperial.ac.uk".
Updated: $Id: calculus.html 11291 2021-01-05 18:26:10Z dmb $