Matrix Manual: Matrix Calculus

Matrix Reference Manual
Proofs Section 1: Basic Theorems

1.1	If A_[_m_#n]=F_[m#r]G_[r#n] has linearly independent columns then r ≥ n and we can find B_[r#r] and a subset of F's columns, D_[m#r-n], such that F = [D A]B. To prove this, we replace the columns of F one at a time with columns taken from A and show that we can do this for all of A's columns. Define F^[k] = [D^[k] a_1:k] consisting of r-k columns selected from F followed by the first k columns of A. We prove by induction that, for each k in the range 0:n, we can choose D^[k] and B^[k] so that F=F^[k]B^[k]. For k=0, the proposition is true with D^[0]=F and B^[0]=I. Suppose F=F^[k]B^[k] is true for some k<n. Then, A=F^[k]B^[k]G and if we define x = (B^[k]G)_k+1, the (k+1)^th column of B^[k]G, we have a_k+1 = F^[k]x. Suppose x_i is the first non-zero element of x. We must have i ≤ r-k for otherwise a_k+1 = [0 a_1:k]x or equivalently A[x_r-k+1:r; -1; 0_[n-k-1#1]] = 0. This would contradict the linear independence of A. If we define D^[k+1] to be D^[k] with its i^th column removed then F^[k] = F^[k+1]S where S=[I_[i-1#i-1] 0;0 -x_i^-1x_i+1:r I_[r-i#r-i];0_[1#i-1] x_i^-1 0_[1#r-i]]. It follows that if B^[k+1]=SB^[k] we have F=F^[k+1]B^[k+1] as required. Finally, we have F=F^[ⁿ^]B^[ⁿ^] = [D^[ⁿ^] A] where D^[ⁿ^] consists of r-n columns selected from F. In order for D^[ⁿ^] to exist, we must therefore have r≥n.
1.2	For any non-zero A_[_m_#n] we can find a linearly independent subset of A's columns, F such that A=F_[m#r]G_[r#n] with G upper triangular and r≤n. We prove this by induction on n. If n=1, set F=A and G=1. Now suppose A, F and G satisfy the theorem and consider the m#n+1 matrix A⁺=[A x]. If x=Fy for some y then set F⁺=F and G⁺=[G y], otherwise set F⁺=[F x] and G⁺=[G 0; 0T 1]. It is straightforward to show in both cases that A⁺=F⁺G⁺ that G⁺ is upper triangular and that r⁺≤n+1. We must also show that in the second case (i.e. if x≠Fy for any y) that the columns of [F x] are linearly independent. If [F x][a; b] = 0, then Fa=-bx. We must have b=0 since otherwise x =F(-a/b) which contradicts the assumption that x ≠ Fy. Hence Fa = 0 which implies that a=0 since the columns of F are linearly independent. Thus the columns of [F x] are also linearly independent.
1.3	For any matrix A_[_m_#n] we have rank(A) ≤ m and rank(A) ≤ n. We have A = A_[_m_#n] I_[n#n] = I_[_m_#m]A_[_m_#n] . Since rank(A) is the minimum inner dimension of such a decomposition, we must have rank(A) ≤ m and rank(A) ≤ n.
1.4	rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B) If A=FG is a full-rank decomposition, then we can decompose AB as F^.GB. Hence rank(AB) is ≤ the inner dimension of this decomposition which is rank(A). A similar argument shows that rank(AB) ≤ rank(B).
1.5	rank(A)=n iff A_[_m_#n] has linearly independent columns. If the columns of A are linearly independent and A=F_[m#r]G_[r#n] is a full-rank decomposition, then from 1.1, we have rank(A) = r ≥ n, but from 1.3, rank(A)≤n. Hence rank(A)=n. If A's columns are not linearly independent then from 1.2 we can find a linearly independent subset F with A=F_[m#r]G_[r#n]. This implies that rank(A) ≤ r < n since the rank is the minimum inner dimension of such a decomposition.
1.6	rank(I_[n#n])=n The columns of I are linearly independent since Ix=0 implies x=0. Hence from 1.5 its rank is n.
1.7	If A has linearly independent columns then we can extend it to a square matrix [D_[_m_#m-n] A_[_m_#n]] with linearly independent columns. We have A=IA so from 1.1 we can find D and B such that I_[_m_#m]=[D A]_[_m_#m]B. From 1.6, 1.4 and 1.3 we have m = rank(I) ≤ rank([D A]) ≤ m. Hence rank([D A])=m and by 1.5 its columns are linearly independent.
1.8	If A_[m#n] has linearly independent columns, then there exists a Q_[m#n] and upper triangular R_[n#n] and S_[n#n] such that A=QR and Q^HQ = RS = I. If A is real then Q, R and S may be chosen to be real. We prove this by induction on n. If n=1, we set Q=a/\|\|a\|\|, R=\|\|a\|\| and S=\|\|a\|\|^-1 where a=A. Note that \|\|a\|\| cannot equal 0 since this would violate the assumption that the columns of A are linearly independent. Now assume the theorem to be true for A_[m#n] with Q, R and S defined as in the theorem. Given an m#(n+1) matrix A⁺ = [A x] with linearly independent columns, we define y = x - QQ^Hx and note that y cannot equal 0 since we would then have x = QQ^Hx = ASQ^Hx contradicting the assumption that the columns of A⁺ are linearly independent. We set Q⁺ = [Q y/\|\|y\|\| ], R⁺ = [R Q^Hx; 0^T \|\|y\|\| ] and S⁺ = [S -SQ^Hx/\|\|y\|\|; 0^T \|\|y\|\|^-1 ]. It is straightforward to show that A⁺, Q⁺, R⁺ and S⁺ satisfy the theorem by performing the appropriate multiplications. Note that if A is real then all the above calculations give real results. This recursive procedure is called Gram-Schmitt orthogonalization but is not recommended because of its poor numerical properties.
1.9	For any A_[m#n] there exists a Q_[m#m] and upper triangular R_[m#n] such that A=QR and Q^HQ = I. If A is real, then Q and R may be taken as real. From 1.2 we decompose A=FG with G upper triangular and F_[m#r] linearly independent. From 1.7 we find D such that [F D]_[m#m] has linearly independent columns. From 1.8 we decompose [F d] = QU where Q^HQ = I. We now have A = FG = [F D] [I; 0] G = Q U [I; 0] G = QR where R = U [I; 0] G is the product of three upper triangular matrices and is therefore upper triangular itself. If A is real then all of thee operations preserve realness. If m≥n, the lower m-n rows of R are all zero and so the product A=QR is unaffected if we delete the last n-r columns of Q and the last n-r rows of R.
1.10	The diagonal elements of a skew-symmetric matrix are all 0. [Provided that the underlying field does not have characteristic 2] If A is skew symmetric then by definition a(i,j)=-a(j,i). By setting j=i, we get a(i,i)=-a(i,i) from which a(i,i)=0.
1.11	The rank of a skew-symmetric matrix is even. [Provided that the underlying field does not have characteristic 2] We prove this by induction starting with a 1#1 skew-symmetric matrix which must equal 0 and hence has even rank. Given a skew-symmetric A_n+1#n+1, we can, by [1.10] partition it as A = [C b ; -b^T 0] where C_n#n is skew-symmetric and rank(C) is even, by the induction hypothesis. If b = Cx for some x, then A = [I x]^T C [I x] and also C = [I 0] A [I 0]^T and so, by [1.4], rank(A) = rank(C) which is even. If, on the other hand, b is linearly independent of the columns of C then, -b^T is also linearly independent of the rows of C = -C^T. Hence rank(A) = rank([C b ; -b^T 0]) = rank([C b]) + 1 = rank(C) + 2 which is even.
1.12	If D is diagonal then AD = DA iff a_i,j=0 whenever d_i,i ≠ d_j,j. (AD)_i,j = a_i,jd_j,j = (DA)_i,j = d_i,ia_i,j ==> a_i,j (d_i,i - d_j,j) = 0 ==> a_i,j = 0 or d_i,i = d_j,j
1.13	If D = DIAG(c₁I₁, c₂I₂, ..., c_MI_M) where the c_k are distinct scalars and the I_j are identity matrices, then AD = DA iff A = DIAG(A₁, A₂, ..., A_M) where each A_k is the same size as the corresponding I_k. From [1.12] a_i,j=0 except when d_i,i = d_j,j. However since the c_k are distinct, d_i,i = d_j,j only within one of the c_kI_k blocks. It follows that a_i,j=0 except when i and j lie in the same block. The result follows.
1.14	A=UDV^H=LDM^H are alternative singular value decompositions of A iff U^HL = DIAG(Q₁, Q₂, ..., Q_k, R) and V^HM = DIAG(Q₁, Q₂, ..., Q_k, S) where Q₁, Q₂, ..., Q_k are unitary matrices whose sizes are given by the multiplicities of the corresponding distinct non-zero singular values and R and S are unitary matrices whose size equals the number of zero singular values. [R.11] Define the unitary matrices P=U^HL and Q=V^HM UD²U^H = AA^H = LD²L^H = UPD²P^HU^H ==> D²P = PD² From [1.13], it follows that P (and also Q, by a similar argument based on A^HA) is block diagonal with orthogonal blocks whose sizes match the multiplicity of the singular values in D. It follows that PD = DP. We have U^HAV = D = U^HLDM^H V= PDQ^H ==> DQ = PD = DP Hence D(P - Q) = 0 and P must equal Q except for rows in which d_ii = 0. The result follows.
1.15	If D is diagonal then XDX^T = sum_i(d_i × x_ix_i^T) and XDX^H = sum_i(d_i × x_ix_i^H) (XDX^T)_pq = sum_i(x_pid_ix_qi) = sum_i(d_i × x_pix_qi) = sum_i(d_i × (x_ix_i^T)_pq) = (sum_i(d_i × x_ix_i^T))_pq The proof for XDX^H is identical.
1.16	If D is diagonal then tr(XDX^T) = sum_i(d_i × x_i^Tx_i) and tr(XDX^H) = sum_i(d_i × x_i^Hx_i) = sum_i(d_i × \|x_i\|²) tr(XDX^T) = tr(sum_i(d_i × x_ix_i^T)) [1.15] = tr(sum_i(d_i × x_i^Tx_i)) = sum_i(d_i × x_i^Tx_i) The proof for tr(XDX^H) is identical.
1.17	tr(AB) = tr(BA) (AB)_i,j = sum_k(a_ikb_kj) tr(AB) = sum_i(AB)_ii = sum_i,k(a_ikb_ki) = sum_k,i(b_kia_ik) = sum_k(BA)_kk = tr(BA)
1.18	tr(AB) = A:^TB^T: = A^T:^TB: = A^H:^HB: = (A:^HB^H:)^C (AB)_i,j = sum_k(a_ikb_kj) tr(AB) = sum_i(AB)_ii = sum_i,k(a_ikb_ki) = sum_i,k(A_ik(B^T)_ik) = A:^TB^T: tr(AB) = tr(BA) [1.17] = B:^TA^T: [1.18] = A^T:^TB: tr(AB) = A^T:^TB: = A^H:^HB: since A^T:^T = A^H:^{^{^H}} tr(AB) = A:^TB^T: = (A:^HB^H:)^C
1.19	tr([A B]^T [C D]) = tr(A^TC) + tr(B^TD) Note that conformality implies that the matrix dimensions are A_[n#m], B_[n#k], C_[n#m] and D_[n#k] tr([A B]^T [C D]) = tr([A^TC A^TD ; B^TC B^TD]) = tr(A^TC) + tr(B^TD) since these diagonal blocks are square.
1.20	The pseudoinverse is unique Assume that both X⁺ and Y satisfy the pseudo inverse equations. We will show that Y = X⁺. (a) XX⁺X=X and (b) XYX=X (a) X⁺XX⁺=X⁺ and (b) YXY=Y (a) (XX⁺)^H=XX⁺ and (b) (XY)^H=XY (a) (X⁺X)^H=X⁺X and (b) (YX)^H=YX Y = YXY [2b] = YY^HX^H [3b] = YY^HX^H(X⁺)^HX^H [1a] = YXY(X⁺)^HX^H [3b] = YXYXX⁺ [3a] = YXX⁺ [2b] = X^HY^HX⁺ [4b] = X^H(X⁺)^HX^HY^HX⁺ [1a] = X⁺XX^HY^HX⁺ [4a] = X⁺XYXX⁺ [4b] = X⁺XX⁺ [1b] = X⁺ [2a]
1.21	C=TOE(a)_[_m#r_]TOE(b)_[r#n] is toeplitz iff a_r+1:r+m-1b_1:n-1^T = a_1:m-1b_r+1:r+n-1^T The product, C, is toeplitz iff c_i+1,j+1-c_i,j = 0 for all 1≤i≤m-1 and 1≤j≤n-1. c_i+1,j+1-c_i,j=sum_p=1:r(A_i+1,pB_p,j+1) - sum_q=1:r(A_i,qB_q,j) where A_u,v=a_u-v+r and B_u,v=b_u-v+n from the toeplitz property. Hence c_i+1,j+1-c_i,j = sum_p=1:r(a_i+1-p+rb_p-j-1+n) - sum_q=1:r(a_i-q+rb_q-j+n) Substituting p=q+1 in the first summation gives c_i+1,j+1-c_i,j = sum_q=0:r-1(a_i-q+rb_q-j+n) - sum_q=1:r(a_i-q+rb_q-j+n) The summation terms all cancel except for q=0 and q=r which leaves c_i+1,j+1-c_i,j = a_i+rb_n-j - a_ib_n-j+r = a_i+rb_k - a_ib_k+r where k=n-j. Note that 1≤j≤n-1 implies 1≤k≤n-1. Thus C is toeplitz iff a_i+rb_k = a_ib_k+r for all 1≤i≤m-1 and 1≤k≤n-1 which gives a_r+1:r+m-1b_1:n-1^T = a_1:m-1b_r+1:r+n-1^T as required.

This page is part of The Matrix Reference Manual. Copyright © 1998-2022 Mike Brookes, Imperial College, London, UK. See the file gfl.html for copying instructions. Please send any comments or suggestions to "mike.brookes" at "imperial.ac.uk".
Updated: $Id: proof001.html 11291 2021-01-05 18:26:10Z dmb $