5.1 |
N(x ; a,
A) N(x ; b, B) = N(a ; b ,
A+B) × N(x ; c, C) where C =
(A-1+B-1)-1 =
A(A+B)-1B =
B(A+B)-1A and c =
C(A-1a+B-1b) =
A(A+B)-1b +
B(A+B)-1a
- We assume without comment below that A, B, C and their
sums and inverses are all symmetric matrices.
- First we define C =
(A-1+B-1)-1 =
A(BA-1A+BB-1A)-1B
= A(B+A)-1B =
A(A+B)-1B = (similarly)
B(A+B)-1A.
- We also define c =
C(A-1a+B-1b) =
B(A+B)-1A
A-1a+A(A+B)-1B
B-1b= A(A+B)-1b +
B(A+B)-1a
- We see that cTC-1c =
(aTA-1+bTB-1)CC-1C(A-1a+B-1b)
=
(aTA-1+bTB-1)C(A-1a+B-1b)
=
aTA-1CA-1a
+
2aTA-1CB-1b
+bTB-1CB-1b
=
aT(A+B)-1BA-1a
+
2aTA-1A(A+B)-1BB-1b
+bT(A+B)-1AB-1b
substituting for C
=
aT(A+B)-1((A+B)A-1-I)
a + 2aT(A+B)-1b
+bT(A+B)-1((A+B)B-1-I)
b
= aT(A-1- (A+B)-1)
a + 2aT(A+B)-1b
+bT(B-1-(A+B)-1)
b
= aTA-1a
+bTB-1b
-(aT(A+B)-1a -
2aT(A+B)-1b
+bT(A+B)-1b)
= aTA-1a
+bTB-1b
-(a-b)T(A+B)-1(a-b)
- Hence (x-c)TC-1(x-c) =
xTC-1x
-2xTC-1c +
cTC-1c
=
xT(A-1+B-1)x
-2xTC-1C(A-1a+B-1b)
+ aTA-1a
+bTB-1b
-(a-b)T(A+B)-1(a-b)
= xTA-1x +
xTB-1x -
2xTA-1a -
2xTB-1b +
aTA-1a +
bTB-1b -
(a-b)T(A+B)-1(a-b)
= (x-a)TA-1(x-a) +
(x-b)TB-1(x-b) -
(a-b)T(A+B)-1(a-b)
- It follows that N(a ; b , A+B) ×
N(x ; c, C) = det(2 pi
(A+B))-½
-
exp((a-b)T(A+B)-1(a-b))
× det(2 pi C)-½
exp((x-c)TC-1(x-c))
= det(4 pi2 (A+B)C)-½
exp((x-c)TC-1(x-c) +
(a-b)T(A+B)-1(a-b))
multiplying determinants and adding exponents
= det(4 pi2
(A+B)A(A+B)-1B)-½
exp((x-a)TA-1(x-a) +
(x-b)TB-1(x-b))
= det(2 pi A)-½
exp((x-a)TA-1(x-a)) ×
det(2 pi B)-½
exp((x-b)TB-1(x-b))
= N(x ; a, A) N(x ; b, B)
|
5.2 |
N(x ; a, A)m =
N(0 ; 0,
m(2×pi)m-2Am-1) ×
N(x ; a, m-1A)
We prove this by induction. For m=1, we have N(x ; a,
A)1 = N(0 ; 0,
(2×pi)-1I) × N(x ; a, A)
which is true.
- N(x ; a, A)m = N(x ;
a, A) × N(x ; a,
A)m-1
= N(x ; a, A) × N(0 ; 0,
(m-1)(2×pi)m-3Am-2)
× N(x ; a, (m-1)-1A)
= N(0 ; 0,
(m-1)(2×pi)m-3Am-2)
× N(x ; a, A) × N(x ; a,
(m-1)-1A)
= N(0 ; 0,
(m-1)(2×pi)m-3Am-2)
× N(a ; a, (1+(m-1)-1)A) ×
N(x ; c, C) where from [5.1] C =
(A-1+(m-1)A-1)-1 =
(mA-1)-1 =
m-1A and c =
C(A-1a+(m-1)A-1a)
= mCA-1a = a
- So N(x ; a, A)m = N(0 ;
0,
(m-1)(2×pi)m-3Am-2)
× N(a ; a, (1+(m-1)-1)A) ×
N(x ; a, m-1A)
= N(0 ; 0, 2 × pi ×
(m-1)(2×pi)m-3Am-2×
(1+(m-1)-1)A) × N(x ; a,
m-1A)
= N(0 ; 0,
(m-1)(1+(m-1)-1)(2×pi)m-2Am-1)
× N(x ; a, m-1A)
= N(0 ; 0,
m(2×pi)m-2Am-1)
× N(x ; a, m-1A)
|
5.3 |
If
x[n] has a complex Gaussian distribution then
Cov(x) = E(xxH) - mmH
= S[n#n] <=>
K[2n#2n] and E(xxT) =
mmT
- From the definition of a complex
gaussian we have
- E(xiRxkR)
= ½si,kR +
miRmkR
- E(xiIxkI)
= ½si,kR +
miImkI
- E(xiIxkR)
= ½si,kI +
miImkR
- E(xiRxkI)
= -½si,kI +
miRmkI
- E(xxH)i,k =
E(xixkC) =
E((xiRxkR +
xiIxkI) +
j(xiIxkR
- xiRxkI)) =
si,kR + jsi,kI
+ mimkC = (S +
mmH)i,k
- E(xxT)i,k =
E(xixk) =
E((xiRxkR -
xiIxkI) +
j(xiIxkR
+ xiRxkI)) =
mimk =
(mmT)i,k
|
5.4 |
If
x[n] has a complex Gaussian distribution then
E(x • xC) = diag(S) +
m • mC
- E(x • xC) =
E(diag(xxH)) =
diag(E(xxH)) = diag(S +
mmH) = diag(S) + m •
mC [5.3]
|
5.5 |
If
x[n] has a complex Gaussian distribution and p
= x • xC then Cov(p) =
E(ppT) - E(p)E(p)T =
S • SC + 2(mmH
• ST)R
- If we define v = diag(S) and w = m
• mC , then
E(p)E(p)T =
(v+w)(v+w)T
- Assume that y = x - m has zero mean and that
E(y • yC) = diag(S) =
v
- E((y • yC)(yT
• yH))i,k =
E(((xiR)2 +
(xiI)2)((xkR)2
+ (xkI)2)) =
E((xiRxkR)2 +
(xiRxkI)2
+
(xiIxkR)2
+
(xiIxkI)2)
- Now from Wick's theorem and [5.3],
E((xiRxkR)2) =
E(xiRxiR)E(xkRxkR)
+ 2(E(xiRxkR))2 =
¼vivk +
½(si,kR)2 =
(¼vvT + ½SR
• SR)i,k
- Similarly
E((xiIxkI)2)
= (¼vvT + ½SR
• SR)i,k and
E((xiIxkR)2) =
E((xiIxkI)2)
= (¼vvT + ½SI
• SI)i,k
- Hence E((y •
yC)(yT •
yH))i,k =
2(¼vvT + ½SR
• SR)i,k +
2(¼vvT + ½SI
• SI)i,k =
(vvT + S •
SC)i,k
- Since the expected value of any odd power of yi is zero
and, from [5.3],
E(yyT) = 0, we have
E((x • xC)(xT
• xH)) = E(((y+m) •
(y+m)C)((y+m)T
• (y+m)H))
= E((y • yC)(yT
• yH) + (y •
yC)(mT •
mH) + (y •
mC)(yT •
mH) + (y •
mC)(mT •
yH) + (m •
yC)(yT •
mH) + (m •
yC)(mT •
yH) + (m •
mC)(yT •
yH) + (m •
mC)(mT •
mH))
= vvT + S • SC +
vwT + 0 + S •
(mmH)C + SC
• (mmH) + 0 + wvT +
wwT = S • SC
+ 2(mmH •
ST)R +
(v+w)(v+w)T
|
5.6 |
If
x[n] has a real Gaussian distribution then E(x
• x) = diag(S) + m •
m = diag(S + m •
mT)
- E(x • x)k =
E(xk • xk) =
sk,k + mk2 =
(diag(S) + m • m)k
|
5.7 |
If
x[n] has a real Gaussian distribution then
Cov(x • x) = 2 S • (S +
2mmT)
- Assume that x = y + m with E(y) = 0 and
and Cov(y) = S
- Cov(x • x)i,k =
E(xi2xk2) -
(si,i +
mi2)(sk,k +
mk2) =
E((yi2+2yimi+mi2)(yk2+2ykmk+mk2))
- (si,i +
mi2)(sk,k +
mk2) = si,isk,k +
2si,k2 +
si,imk2 +
sk,kmi2 +
4si,kmimk
+ mi2mk2 -
(si,i +
mi2)(sk,k +
mk2) = 2si,k
(si,k+2mimk)
= (2 S • (S +
2mmT))i,k
|
5.8 |
If the joint distribution
[x; y] ~ N([x; y]; [p; q], [P
R; RT Q]) then x | y ~
N(x, p+RQ-1(y-q), P -
RQ-1RT). The functions
fi(y) below denote functions that are independent of
x.
- p(x | y) = p(x, y) / p(y) =
f1(y) exp( -½[x-p;
y-q]T S-1 [x-p; y-q] )
where S = [P R; RT Q]
- From [3.5] S-1 =
[C-1, -C-1RQ-1;
-Q-1RTC-1,
Q-1(I+RTC-1RQ-1)]
where C =(P-RQ-1RT)
- Hence [x-p; y-q]T S-1
[x-p; y-q] =
(x-p)TC-1(x-p) -
2(x-p)TC-1RQ-1(y-p)
+ f2(y) =
(x-p-RQ-1(y-q))TC-1(x-p-RQ-1(y-q))
+ f3(y)
- Hence p(x | y) = f4(y) exp(
-½(x-p-RQ-1(y-q))TC-1(x-p-RQ-1(y-q))
) = f5(y) N(x,
p+RQ-1(y-q), C) =
f5(y) N(x,
p+RQ-1(y-q), P -
RQ-1RT) where
f5(y) must equal 1 to give the correct integral.
|
5.9 |
If x ~ N(x;
m, S) then x | ATx=b ~
N(x, (I-HAT)m+Hb,
(I-HAT)S) where
H=SA(ATSA)-1
- Define z = [x; ATx] =
[I; AT](x-m). Then z has
mean [m; ATm] and Cov(z) =
E([I;
AT](x-m)(x-m)T[I,
A]) = [I; AT]S[I,
A] = [S SA; ATS
ATSA]
- From [5.8]
x | ATx=b ~ N(x,
m+SA(ATSA)-1(b-ATm),
S -
SA(ATSA)-1ATS)
= N(x, (I-HAT)m+Hb,
(I-HAT)S)
|
5.10 |
If x ~ N(x;
m, S) then y = Ax+b ~
N(y; Am+b, ASAT)
- E(y) = E(Ax+b) = A E(x) + b =
Am+b
- Var(y) = E(yyT) - E(y)E(y)T
= E(AxxTAT +
AxbT +
bxTAT +
bbT) -
(AmmTAT +
AmbT +
bmTAT +
bbT)
= (A(S+mmT)AT +
AmbT +
bmTAT +
bbT) -
(AmmTAT +
AmbT +
bmTAT +
bbT)
= ASAT
|
5.11 |
If S is +ve
semidefinite Hermitian, then the elements of R =
DIAG(S)-½ S
DIAG(S)-½ have magnitude <= 1.
- |ri,j|2 = si,j2
(si,isj,j)-1 <= 1 from
[3.6]
|
5.12 |
If the joint distribution
[x; y] ~ N([x; y]; [p; q], [P
RT; R Q]) then Var(xi -
aTy) is minimum when a =
Q-1Ri when its value is
diag(P)i -
RiTQ-1Ri.
This proof is adapted from [R.1]. Since
variances are unaffected by a shift of mean we can assume wlog that p=0
and q=0. Now set b = a -
Q-1Ri, then for any a we
have
- Now note that for a zero-mean scalar z, Var(z) =
E(zzT). Hence
Var(xi - aTy) =
Var(xi -
RiTQ-1y -
bTy)
= E((xi -
RiTQ-1y -
bTy)(xi -
RiTQ-1y -
bTy)T)
= E(xixiT +
RiTQ-1yyTQ-1Ri
+ bTyyTb -
2xiyTQ-1Ri
- 2xiyTb +
2RiTQ-1yyTb)
= diag(P)i +
RiTQ-1QQ-1Ri
+ bTQb -
2RiTQ-1Ri
- 2RiTb +
2RiTQ-1Qb
= diag(P)i +
RiTQ-1Ri
+ bTQb -
2RiTQ-1Ri
- 2RiTb +
2RiTb
= diag(P)i -
RiTQ-1Ri
+ bTQb
Only the last term depends on b and, since Q is positive
semidefinite, this is minimum when b = 0.
|
5.13 |
If the joint distribution
[x; y] ~ N([x; y]; [p; q], [P
RT; R Q]) then the maximum (over a)
of the correlation between xi and
aTy is obtained when a =
Q-1Ri
This proof is adapted from [R.1]. Since
correlations are unaffected by a shift of mean we can assume wlog that
p=0 and q=0. For any a we can define
c=sqrt(RiTQ-1Ri
/ aTQa). From [5.12] we know that
- Var(xi - caTy)
>= Var(xi -
RiTQ-1y)
- E(xixiT +
c2aTyyTa
- 2cxiyTa) >=
E(xixiT +
RiTQ-1yyTQ-1Ri
-
2xiyTQ-1Ri)
- diag(P)i +
c2aTQa -
2cE(xiyTa) >=
diag(P)i +
RiTQ-1QQ-1Ri
-
2E(xiyTQ-1Ri)
- c2aTQa -
2cE(xiyTa) >=
RiTQ-1Ri
-
2E(xiyTQ-1Ri)
-
RiTQ-1Ri
- 2cE(xiyTa)
>=
RiTQ-1Ri
-
2E(xiyTQ-1Ri)
- cE(xiyTa) <=
E(xiyTQ-1Ri)
- E(xiyTa) /
sqrt(aTQa) <=
E(xiyTQ-1Ri)
/
sqrt(RiTQ-1Ri
)
- E(xiyTa) / sqrt(
Var(xi) × aTQa) <=
E(xiyTQ-1Ri)
/ sqrt( Var(xi) ×
RiTQ-1Ri
)
- E(xiyTa) / sqrt(
Var(xi) ×
Var(aTy)) <=
E(xiyTQ-1Ri)
/ sqrt( Var(xi) ×
Var(RiTQ-1y))
Thus the correlation between xi and
aTy is bounded above by the right hand side and
attains this bound when a =
Q-1Ri. The value of the
correlation is given by
-
E(xiyTQ-1Ri)
/ sqrt( Var(xi) ×
Var(RiTQ-1y))
=
RiTQ-1Ri
/ sqrt( diag(P)i ×
RiTQ-1Ri)
=
sqrt(RiTQ-1Ri
/ diag(P)i)
|
5.14 |
(gx|y)i2 = 1 -
pii-1 det([pii ,
riT; ri ,
Q]) det(Q)-1
- 1 - pii-1 det([pii ,
riT; ri ,
Q]) det(Q)-1
- = 1 - pii-1 ((pii -
riTQ-1ri)
det(Q)) det(Q)-1 [3.1]
- = 1 - (1 -
pii-1riTQ-1ri)
=
pii-1riTQ-1ri
= (FR)ii ÷ pii
= (gx|y)i2
|
5.15 |
d/dq
(ln(p(X)) =
1[k#1]T(X-M)TS-1
dm/dq - ½(k S-1 -
S-1(X-M)(X-M)TS-1):T
dS/dq where
M[n#k] =
m×1[k#1]T.
|
5.16 |
E(vvT) = k
((dm/dq)T S-1
dm/dq +
½(dS/dq)T (S
⊗ S)-1 dS/dq )
where vT =
1[k#1]T(X-M)TS-1
dm/dq - ½(k
S-1:T -
((X-M)(X-M)T):T
(S-1 ⊗ S-1))
dS/dq
- Since v is the sum of k independent identically distributed
zero-mean vectors, we can calculate E(vvT) for
k=1 and then multiply the result by k. For k=1, v
simplifies to
vT =
(x-m)TS-1
dm/dq + ½(S -
(x-m)(x-m)T ):T
dP/dq where P = S-1
(see [5.15])
- When we form the product vvT, the cross terms are
cubic in (x-m) and therefore have zero mean. We deal with the two
squared terms in vvT separately. For the first term
- k E (((x-m)TS-1
dm/dq)T
((x-m)TS-1
dm/dq))
= k E ((dm/dq)T
S-1(x-m)(x-m)TS-1
dm/dq)
= k E ((dm/dq)T
S-1 dm/dq)
- Since the second term has zero mean, we can write
- ¼k E
{(dP/dq)T (S -
(x-m)(x-m)T ):
(S -
(x-m)(x-m)T ):T
dP/dq}
= ¼k
(dP/dq)TE {
((x-m)(x-m)T ):
((x-m)(x-m)T):T
}dP/dq -
¼k(dP/dq)T
S: S:T
dP/dq
- We define T = TVEC(n,n) so that T
S: = TT S: =
ST: =S: since S and T are
symmetrical. [see vectorized
transpose]
- To calculate the quartic expectation, E
{((x-m)(x-m)T ):
((x-m)(x-m)T):T
}, we use Wick's rule take the three possible pairings of the
(x-m) terms and, treating x and y as distinct
random vectors, we get:
- E {((x-m)(x-m)T ):
((x-m)(x-m)T):T
}
= E {((x-m)(x-m)T ):
((y-m)(y-m)T):T
} + E {((x-m)(y-m)T ):
((x-m)(y-m)T):T
} + E {((x-m)(y-m)T ):
((y-m)(x-m)T):T
}
= S:S:T + E {((y-m)
⊗ (x-m) )((y-m) ⊗
(x-m))T } + E {((y-m)
⊗ (x-m) )((y-m) ⊗
(x-m))TTT }
[see kroneker
product and vectorized transpose]
= S:S:T + E
{((y-m)(y-m) ) ⊗
((x-m)(x-m))T } (I +
TT)
= S:S:T + (S ⊗ S)
(I + TT)
- Substituting this into the second term expression gives
¼k
(dP/dq)T(S:S:T
+ (S ⊗ S) (I +
TT))dP/dq -
¼k(dP/dq)T
S: S:T
dP/dq
= ¼k (dP/dq)T
(S ⊗ S) (I +
TT)dP/dq
[Since the first and last terms
cancel]
= ½k
(dP/dq)T(S ⊗
S) dP/dq [Since the symmetry of dP means that
TT dP/dq =
dP/dq]
- Putting this all together and using dP/dq =
dP/dS dS/dq= -(P
⊗ P) dS/dq = -(S-1
⊗ S-1) dS/dq
[Derivative of
Inverse] we obtain
J = E {vvT} = k E
((dm/dq)T S-1
dm/dq) + ½k
(dP/dq)T(S ⊗
S)dP/dq
= k E ((dm/dq)T
S-1 dm/dq) + ½k
(dS/dq)T(S-1
⊗ S-1)(S ⊗
S)(S-1 ⊗
S-1)dS/dq
= k E ((dm/dq)T
S-1 dm/dq) + ½k
(dS/dq)T(S-1
⊗ S-1)dS/dq
|
5.17 |
If
f[r#1](X) is a function of X with mean
value g(q), then Cov(f) >=
dg/dq J-1
(dg/dq)T
- g = E (f) means that dg/dq =
E(df/dq + f vT) =
E(f vT) where vT =
d/dq (ln(p(X)).
- For any arbitrary vector a, we have the scalar equation
aT dg/dq J-1
(dg/dq)T a =
E(aT f
vTJ-1
(dg/dq)T a) =
Cov(aT f, aT
dg/dq J-1v)
- Hence from the Cauchy-Schwarz inequality
(aT dg/dq J-1
(dg/dq)T a)2
<= Var(aT
f)Var(aT
dg/dq J-1v)
= (aTCov(f)a) (aT
dg/dq J-1Cov(v)
J-1(dg/dq)T
a) = (aTCov(f)a)
(aT dg/dq
J-1(dg/dq)T
a) [Since Cov(v) =
J ]
- Hence (aT dg/dq
J-1 (dg/dq)T
a) <= aTCov(f)a for any
a
- So Cov(f) >= dg/dq
J-1
(dg/dq)T where >=
represents the Loewner partial order.
|
5.18 |
For even n,
E(prod(x[n]-m)) =
(½n)!-12-½n
sumv(sv(1),v(2)sv(3),v(4)...sv(n-1),v(n))
where the sum is over all n! permutations v of the numbers
1:n. An equivalent formula is to omit the normalizing factor,
(½n)!-12-½n, and to restrict
the summation to all distinct pairings of the numbers 1:n.
- Without loss of generality, we assume that m = 0. The
characteristic function is f(t) =
E(exp(jtTx)) =
exp(-½tTSt) where
j=sqrt(-1).
- Differentiating E(exp(jtTx))
shows that E(prod(x)) = j-n
dnf/dt1dt2...dtn
evaluated at t=0.
- Expanding the power series f(t) =
exp(-½tTSt) =
sum((-½tTSt)r/r!;
r=0..inf). Term r in this contains only terms in
txr where x is some combination of the
subscripts 1:n. When differentiated n times, terms with r
< n/2 will become zero, whereas terms with r > n/2
will still contain powers of tx and will therefore be zero
when t is set to 0.
- It follows that when n is odd, all terms vanish and
E(prod(x))=0.
- When n is even, E(prod(x)) =
dng/dt1dt2...dtn
evaluated at t=0 where g(t) = j-n
(-½tTSt)½n/(½n)!
=
½½n(tTSt)½n/(½n)!
- We can expand
(tTSt)½n=tTSt
× tTSt × ...×
tTSt and we now differentiate in turn with
respect to tn, tn-1, ... ,
t1. We note that t occurs n times in this
expansion and that dt/dti =
ei a column of the identity matrix.
- When we differentiate with respect to tn, we obtain a sum
of n terms, in each of which one of the occurences of t has been
replaced by en and the remaining n-1
occurrences remain as t .
- If we now differentiate with respect to tn-1,
each of the n terms in the previous step changes into a sum of
n-1 terms in which one ot the occurences of t is now replaced by
en-1 and the remaining n-2 occurrences remain
as t . We thus have a total of n(n-1) terms.
- We repeat this process for tn-2, ... ,
t1and we end up with n! terms of the form
(ev(1))TSev(2)(ev(3))TSev(4)...(ev(n-1))TSev(n)
=
sv(1),v(2)sv(3),v(4)...sv(n-1),v(n)
where v runs through all n! permutations of 1:n.
- Thus E(prod(x)) =
(½n)!-12-½n
sumv(sv(1),v(2)sv(3),v(4)...sv(n-1),v(n))
- Note that each term in the summation arises
(½n)!2½n times since the
½n factors sij can be rearranged in
(½n)! orders and for each factor sij =
sji since S is symmetric. Thus an equivalent formula
is to omit the normalizing factor,
(½n)!-12-½n, and restrict the
summation to all distinct pairings of the numbers 1:n. This is known as
Isserlis' theorem or Wick's theorem.
|
5.19 |
[x
Real Gaussian, m=0] Yn=<(xxT)n>,
then Y1=S and Yn+1=
tr(Yn)S+2nSYn
- By defintion, Y1=S. So we assume
n>0.
- Yn+1 consists of 2n+2 x terms
multiplied together. By Wick's theorem [5.18] we can obtain Yn+1 by
considering all possible pairings of the x's and treating each pair as
an independent Gaussian vector.
- We write Yn+1=
<x(xTx)nxT>
and note that the central term,
(xTx)n, is a product of
scalars which may therefore be re-ordered arbitrarily. If we dentote the frst
x by x1, its pair can be in any of 2n+1
positions; we consider three cases:
- The pair to x1 can be the first x in one of the
scalars that form (xTx)n;
this gives n terms of the form
<x1(x1Tx)(xTx)n-1xT>
=
<x1x1T(xxT)n>
= SYn
- The pair to x1 can be the second x in one of the
scalars that form (xTx)n;
this gives n terms of the form
<x1(xTx1)(xTx)n-1xT>
=
<x1(x1Tx)(xTx)n-1xT>
=
<x1x1T(xxT)n>
= SYn
- The pair to to x1 can be the final x; this
gives a single term of the form
<x1(xTx)nx1T>
=
<(xTx)nx1x1T>
= tr(Yn)S
- Adding these 2n+1 terms gives
Yn+1=
nSYn + nSYn+
tr(Yn)S =
tr(Yn)S+2nSYn.
|
5.20 |
[x Complex
Gaussian, m=0] Yn=<(xxH)n>,
then Y1=S and Yn+1=
tr(Yn)S+nSYn
- By defintion, Y1=S. So we assume
n>0.
- Yn+1 consists of 2n+2 x terms
multiplied together. By Wick's theorem [5.18] we can obtain Yn+1 by
considering all possible pairings of the x's and treating each pair as
an independent complex Gaussian vector.
- We write Yn+1=
<x(xHx)nxH>
and note that the central term,
(xHx)n, is a product of
scalars which may therefore be re-ordered arbitrarily. If we dentote the frst
x by x1, its pair can be in any of 2n+1
positions; we consider three cases:
- The pair to x1 can be the first x in one of the
scalars that form (xHx)n;
this gives n terms of the form
<x1(x1Hx)(xHx)n-1xH>
=
<x1x1H(xxH)n>
= SYn
- The pair to x1 can be the second x in one of the
scalars that form (xHx)n;
this gives n terms of the form
<x1(xHx1)(xTx)n-1xT>
=
<x1(x1Tx*)(xHx)n-1xH>
=
<x1x1Tx*xH(xxH)n-1>
= 0 [5.3]
- The pair to to x1 can be the final x; this
gives a single term of the form
<x1(xHx)nx1H>
=
<(xHx)nx1x1H>
= tr(Yn)S
- Adding these 2n+1 terms gives
Yn+1=
nSYn + 0+
tr(Yn)S =
tr(Yn)S+nSYn.
|
5.21 |
If y
~ kN(y; 0, 1) is
restricted to the domain p<y<q, then,
E(y)=r= (f(p)-f(q))/(F(q)-F(p)) and
Var(y)=v=1-(q f(q) - p
f(p))/(F(q)-F(p)) -
r2 where k=1/(F(q)-F(p)), with f(x)
and F(x) being the pdf and cdf of the standard
Gaussian.
We first note that f(-x)=f(x),
F(-x)=1-F(x),
dF/dx=f(x) and df/dx =
-x f(x).
So with all integrals going from p to q to infinity we can
write ∫
f(x)dx =
F(q)-F(p),
∫ xf(x)dx =
f(p)-f(q),
∫
x2f(x)dx =F(q)
- F(p) + p f(p) - q
f(q).
From this, E(y) =
(f(p)-f(q))/(F(q)-F(p))=r
and Var(y) = 1-(q f(q) - p
f(p))/(F(q)-F(p)) -
r2.
For the special case p=-∞,
f(p)=F(p)=0 and E(y) = r
= -f(q)/F(q)=r and
Var(y) = v = 1+r (q-r).
For the special case q=+∞,
f(q)=0, F(q)=1 and E(y) =
r =
f(p)/(1-F(p))=f(p)/F(-p)
and Var(y) = v = 1+ p
f(p)/F(-p) -
r2.
|
5.22 |
Suppose y ~
kN(y; m, S) is restricted to the domain satisfying
b<aTy<c with
k a normalizing constant. Then E(y) = m+grSa and Cov(y) = S -
g2vSa(Sa)T
where g=1/sqrt(aTSa),
p=g(b-aTm), q=g(c-aTm),
r= (f(p)-f(q))/(F(q)-F(p)), v=(q f(q) -
p f(p))/(F(q) -
F(p)) + r2 and
f(q) and
F(q) are the pdf and cdf
respectively of a standard 1-dimensional Gaussian with
f(q)=dF/dq.
We use Cholesky decomposition to find T such that
S=TTT and define
w=T-1(y-m) which implies y =
Tw+m.
So w ~ kN(w; 0,
T-1ST-T) = kN(w;
0, I) within the domain satisfying b-aTm
< aTTw <
c-aTm.
Now find an orthogonal Q such that
gQTTa = e where e is
the first column of the identity matrix and g =
1/sqrt(aTTTTa)=1/sqrt(aTSa)>
0. Thus Q is any orthogonal matrix whose first row is
gaTT. If follows that
gTTa =
QTe which implies
gaTTQT= e.
Now we define z=Qw which implies
w=QTz.
So z ~ kN(z; 0, I) within the domain defined by
b-aTm <
aTTQTz <
c-aTm which is equivalent to
g(b-aTm) <
eTz <
g(c-aTm).
Since only the first element of z is constrained and all elements are
independent, we can use [5.21] to write E(z) = re
and Cov(z) = I - veeT where
p=g(b-aTm),
q=g(c-aTm),
r=(f(p)-f(q))/(F(q)-F(p))
and v=(q f(q) - p
f(p))/(F(q) - F(p)) +
r2.
We have y = TQTz+m and so
E(y) = rTQTe+m and
Cov(y)=TQT(I -
veeT)QTT =
TQTQTT-vTQTeeTQTT
. From above we have gTTa =
QTe which implies TQTe
= gTTTa =
gSa.
Using this substitution (along with QTQ =
I) gives E(y) = m+grSa and
Cov(y) = S -
g2vSa(Sa)T as
required. |
5.23 |
<(Ax + a)(Bx +
b)H> = ASBH +
(Am+a)(Bm+b)H
- We define the zero-mean vector u = x-m so that
<u> = 0 and <uuH> =
S.
- <(Ax + a)(Bx + b)H> = <(Au + Am
+ a)(Bu + Bm + b)H>
- = <AuuHBH +
Au(Bm+b)H +
(Am+a)uHBH + (Am +
a)(Bm + b)H>
- = ASBH + 0 + 0 + (Am + a)(Bm +
b)H = ASBH + (Am +
a)(Bm + b)H
|
5.24 |
<(Ax+a)H (Bx+b)> =
<tr((Bx+b)(Ax+a)H )> =
tr(BSAH) + (Am+a)H
(Bm+b)
- <(Ax + a)H(Bx + b)> = tr(<(Bx
+ b)(Ax + a)H>)
- = tr(BSAH) + tr((Bm +
b)(Am + a)H) [5.23]
- = tr(BSAH) + (Am +
a)H(Bm + b)
|
5.25 |
argminK<||(AKB+C)x||2> =
-(AHA)-1AHC(S+mmH)BH(B(S+mmH)BH)-1
-
argminK<||(AKB+C)x||2>
=
argminK<(AKB+C)x((AKB+C)x)H>
- =
argminK<tr{(AKB+C)xxH(AKB+C)H}>
- =
argminK{tr((AKB+C)(S+mmH)(AKB+C)H)}
-
= -(AHA)-1AHC(S+mmH)BH(B(S+mmH)BH)-1
[2.7]
|
5.26 |
argminK<||(AKB+C)x +
(AKE+F)y||2> =
-(AHA)-1AH(CSxBH+FSyEH)(BSxBH+ESyEH)-1
- We can define S = <[x; y][x; y]H>
= [Sx 0; 0 Sy] since x and y
are independent and zero mean
- argminK<||(AKB+C)x +
(AKE+F)y||2>
- = argminK<||(AK[B E]+[C F])[x;
y]||2>
- =
-(AHA)-1AH[C+F]S[B+E]H([B+E]S[B+E]H)-1
[5.25]
- =
-(AHA)-1AH(CSxBH+FSyEH)(BSxBH+ESyEH)-1
|
5.27 |
<(Ax + a)(Bx +
b)T(Cx + c) (Dx + d)T>
=
(ASBT+(Am+a)(Bm+b)T)(CSDT+(Cm+c)
(Dm+d)T) +
(ASCT+(Am+a)(Cm+c)T)(BSDT+(Bm+b)
(Dm+d)T) +
(Bm+b)T(Cm+c)×(ASDT
- (Am+a)(Dm+d)T) +
tr(BSCT)×(ASDT
+ (Am+a)(Dm+d)T)
- We define the zero-mean vector u = x-m so that
<u> = 0 and <uuT> =
S.
- We now multiply out the quartic expression, omitting terms invovling odd
powers of u since their mean is zero [5.18]
- <(Au + a)(Bu + b)T(Cu + c) (Du
+ d)T>
- =
<AuuTBTCuuTDT
+
AuuTBTcdT
+ AubTCudT +
AubTcuTDT
+
auTBTCudT
+
auTBTcuTDT
+ abT
CuuTDT +
abTcdT>
- =
<AuuTBTCuuTDT
+
AuuTBTcdT
+
AuuTCTbdT
+
bTc×AuuTDT
+ auTBTCudT +
acTBuuTDT
+ abT CuuTDT +
abTcdT>
- where we have moved or transposed some scalar factors of the form
pTq. Defining v to have the same
statistics as u but independent of it, we use Isserlis' theorem
[5.18] to decompose the
first term as
-
<AuuTBTCuuTDT>
=
<AuuTBTCvvTDT>
+
<AuvTBTCuvTDT>
+
<AuvTBTCvuTDT>
- By again moving or transposing scalar factors of the form
pTq we can rearrange to get
- =
<AuuTBTCvvTDT>
+
<AuuTCTBvvTDT>
+
<AuuTDT×vTBTCv>
- We now use the quadratic expectation theorems to give
- = ASBTCSDT +
ASCTBSDT +
ASDT×tr(BTCS)
- Therefore we can write
- <(Au + a)(Bu + b)T(Cu + c) (Du
+ d)T>
- =
<AuuTBTCuuTDT
+
AuuTBTcdT
+
AuuTCTbdT
+
bTc×AuuTDT
+ auTBTCudT +
acTBuuTDT
+ abT CuuTDT +
abTcdT>
- = ASBTCSDT +
ASCTBSDT +
ASDT×tr(BTCS)
+ ASBTcdT +
ASCTbdT +
bTc×ASDT +
adT×tr(BTCS)
+
acTBSDT +
abTCSDT +
abTcdT>
- =
(ASBT+abT)(CSDT+cdT)
+
(ASCT+acT)(BSDT+bdT)
+ bTc×(ASDT
- adT) +
tr(BSCT)×(ASDT
+ adT)
- where we use tr(BTCS) =
tr(BSCT) and also add and subtract a copy of
abTcdT =
acTbdT =
bTc×adT
- Now we note that Ax + a = Au + (Am + a)
so if we replace a by Am+a, b by Bm+b
etc in the above expression, we obtain
- <(Ax + a)(Bx + b)T(Cx + c) (Dx
+ d)T> =
(ASBT+(Am+a)(Bm+b)T)(CSDT+(Cm+c)
(Dm+d)T) +
(ASCT+(Am+a)(Cm+c)T)(BSDT+(Bm+b)
(Dm+d)T) +
(Bm+b)T(Cm+c)×(ASDT
- (Am+a)(Dm+d)T) +
tr(BSCT)×(ASDT
+ (Am+a)(Dm+d)T)
|
5.28 |
<(Ax +
a)T(Bx + b) (Cx +
c)T(Dx + d)> =
tr(AS(CTD+DTC)SBT)
+ ((Am+a)TB +
(Bm+b)TA)S(CT(Dm+d) +
DT(Cm+c)) +
(tr(ASBT)+(Am+a)T(Bm+b))(tr(CSDT)+(Cm+c)T(Dm+d))
- <(Ax + a)T(Bx +
b) (Cx + c)T(Dx + d)> = tr(<(Bx +
b) (Cx + c)T(Dx + d)(Ax +
a)T>)
- so, from [5.27] we can write
-
= tr((BSCT+(Bm+b)(Cm+c)T)(DSAT+(Dm+d)
(Am+a)T) +
(BSDT+(Bm+b)(Dm+d)T)(CSAT+(Cm+c)
(Am+a)T) +
(Cm+c)T(Dm+d)×(BSAT
- (Bm+b)(Am+a)T) +
tr(CSDT)×(BSAT
+ (Bm+b)(Am+a)T))
- = tr(BSCTDSAT) +
tr(BSCT(Dm+d)(Am+a)T)
+
tr((Bm+b)(Cm+c)T(DSAT+(Dm+d)(Am+a)T))
+ tr(BSDTCSAT) +
tr(BSDT(Cm+c)(Am+a)T)
+
tr((Bm+b)(Dm+d)T(CSAT+(Cm+c)(Am+a)T))
+
(Cm+c)T(Dm+d)×tr(BSAT
- (Bm+b)(Am+a)T) +
tr(CSDT)×tr(BSAT
+ (Bm+b)(Am+a)T)
- Now, noting that tr(pqT) =
qTp and also collecting together all terms
that equal
(Am+a)T(Bm+b)(Cm+c)T(Dm+d),
we can write
- = tr(BSCTDSAT)) +
(Am+a)TBSCT(Dm+d) +
(Cm+c)TDSAT(Bm+b) +
tr(BSDTCSAT) +
(Am+a)TBSDT(Cm+c) +
(Dm+d)TCSAT(Bm+b) +
(Cm+c)T(Dm+d)×tr(BSAT)
+
tr(CSDT)×(tr(BSAT)
+ (Am+a)T(Bm+b)) +
(2-1)×(Am+a)T(Bm+b)(Cm+c)T(Dm+d)
- = tr(BSCTDSAT +
BSDTCSAT) +
(Am+a)TBS(CT(Dm+d)
+ DT(Cm+c)) +
(Bm+b)TASDT(Cm+c)
+
(Bm+b)TASCT(Dm+d)
+
(Cm+c)T(Dm+d)×tr(ASBT)
+
tr(CSDT)×(tr(ASBT
) + (Am+a)T(Bm+b)) +
(Am+a)T(Bm+b)(Cm+c)T(Dm+d)
- =
tr(ASDTCSBT +
ASCTDSBT) +
(Am+a)TBS(CT(Dm+d)
+ DT(Cm+c)) +
(Bm+b)TAS(CT(Dm+d)
+ DT(Cm+c)) +
(tr(CSDT) +
(Cm+c)T(Dm+d)) ×
(tr(ASBT ) +
(Am+a)T(Bm+b))
- =
tr(AS(DTC+CTD)SBT)
+ ((Am+a)T B+
(Bm+b)TA)S(CT(Dm+d)
+ DT(Cm+c)) +
(tr(CSDT) +
(Cm+c)T(Dm+d))
(tr(ASBT) +
(Am+a)T(Bm+b))
|