V_GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx) Usage: (1) [m,v,w]=v_gaussmix(x,[],[],k); % create GMM with k mixtures and diagonal covariances (2) [m,v,w]=gaussmix(x,[],[],k,'v'); % create GMM with k mixtures and full covariances Inputs: n data values, k mixtures, p parameters, l loops X(n,p) Input data vectors, one per row. C(1) Minimum variance of normalized data (Use [] to take default value of 1/n^2) L The integer portion of l gives a maximum loop count. The fractional portion gives an optional stopping threshold. Iteration will cease if the increase in log likelihood density per data point is less than this value. Thus l=10.001 will stop after 10 iterations or when the increase in log likelihood falls below 0.001. As a special case, if L=0, then the first three outputs are omitted. Use [] to take default value of 100.0001 M0 Number of mixtures required (or initial mixture means - see below) V0 Initialization mode: 'f' Initialize with K randomly selected data points [default] 'p' Initialize with centroids and variances of random partitions 'k' k-means algorithm ('kf' and 'kp' determine initialization) 'h' k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default] 's' use unscaled data during initialization phase instead of scaling it first 'm' M0 contains the initial centres 'v' full covariance matrices Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK W0(n,1) Data point weights Alternatively, initial values for M0, V0 and W0 can be given explicitly: M0(k,p) Initial mixture means, one row per mixture. V0(k,p) Initial mixture variances, one row per mixture. or V0(p,p,k) one full-covariance matrix per mixture W0(k,1) Initial mixture weights, one per mixture. The weights should sum to unity. WX(n,1) Data point weights Outputs: (Note that M, V and W are omitted if L==0) M(k,p) Mixture means, one row per mixture. (omitted if L==0) V(k,p) Mixture variances, one row per mixture. (omitted if L==0) or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified) W(k,1) Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0) G Average log probability of the input data points. F Fisher's Discriminant measures how well the data divides into classes. It is the ratio of the between-mixture variance to the average mixture variance: a high value means the classes (mixtures) are well separated. PP(n,1) Log probability of each data point GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end The fitting procedure uses one of several initialization methods to create an initial guess for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine the guess. Although the EM algorithm is deterministic, the initialization procedures use random numbers and so the routine will not give identical answers if you call it multiple times with the same input data. See v_randvec() for generating GMM data vectors.
0001 function [m,v,w,g,f,pp,gg]=v_gaussmix(x,c,l,m0,v0,w0,wx) 0002 %V_GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx) 0003 % 0004 % Usage: 0005 % (1) [m,v,w]=v_gaussmix(x,[],[],k); % create GMM with k mixtures and diagonal covariances 0006 % (2) [m,v,w]=gaussmix(x,[],[],k,'v'); % create GMM with k mixtures and full covariances 0007 % 0008 % Inputs: n data values, k mixtures, p parameters, l loops 0009 % 0010 % X(n,p) Input data vectors, one per row. 0011 % C(1) Minimum variance of normalized data (Use [] to take default value of 1/n^2) 0012 % L The integer portion of l gives a maximum loop count. The fractional portion gives 0013 % an optional stopping threshold. Iteration will cease if the increase in 0014 % log likelihood density per data point is less than this value. Thus l=10.001 will 0015 % stop after 10 iterations or when the increase in log likelihood falls below 0016 % 0.001. 0017 % As a special case, if L=0, then the first three outputs are omitted. 0018 % Use [] to take default value of 100.0001 0019 % M0 Number of mixtures required (or initial mixture means - see below) 0020 % V0 Initialization mode: 0021 % 'f' Initialize with K randomly selected data points [default] 0022 % 'p' Initialize with centroids and variances of random partitions 0023 % 'k' k-means algorithm ('kf' and 'kp' determine initialization) 0024 % 'h' k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default] 0025 % 's' use unscaled data during initialization phase instead of scaling it first 0026 % 'm' M0 contains the initial centres 0027 % 'v' full covariance matrices 0028 % Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK 0029 % W0(n,1) Data point weights 0030 % 0031 % Alternatively, initial values for M0, V0 and W0 can be given explicitly: 0032 % 0033 % M0(k,p) Initial mixture means, one row per mixture. 0034 % V0(k,p) Initial mixture variances, one row per mixture. 0035 % or V0(p,p,k) one full-covariance matrix per mixture 0036 % W0(k,1) Initial mixture weights, one per mixture. The weights should sum to unity. 0037 % WX(n,1) Data point weights 0038 % 0039 % Outputs: (Note that M, V and W are omitted if L==0) 0040 % 0041 % M(k,p) Mixture means, one row per mixture. (omitted if L==0) 0042 % V(k,p) Mixture variances, one row per mixture. (omitted if L==0) 0043 % or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified) 0044 % W(k,1) Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0) 0045 % G Average log probability of the input data points. 0046 % F Fisher's Discriminant measures how well the data divides into classes. 0047 % It is the ratio of the between-mixture variance to the average mixture variance: a 0048 % high value means the classes (mixtures) are well separated. 0049 % PP(n,1) Log probability of each data point 0050 % GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end 0051 % 0052 % The fitting procedure uses one of several initialization methods to create an initial guess 0053 % for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine 0054 % the guess. Although the EM algorithm is deterministic, the initialization procedures use 0055 % random numbers and so the routine will not give identical answers if you call it multiple 0056 % times with the same input data. See v_randvec() for generating GMM data vectors. 0057 0058 % Bugs/Suggestions 0059 % (1) Allow processing in chunks by outputting/reinputting an array of sufficient statistics 0060 % (2) Other initialization options: 0061 % 'l' LBG algorithm 0062 % 'm' Move-means (dog-rabbit) algorithm 0063 % (3) Allow updating of weights-only, not means/variances 0064 % (4) Allow freezing of means and/or variances 0065 0066 % Copyright (C) Mike Brookes 2000-2009 0067 % Version: $Id: v_gaussmix.m 10865 2018-09-21 17:22:45Z dmb $ 0068 % 0069 % VOICEBOX is a MATLAB toolbox for speech processing. 0070 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0071 % 0072 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0073 % This program is free software; you can redistribute it and/or modify 0074 % it under the terms of the GNU General Public License as published by 0075 % the Free Software Foundation; either version 2 of the License, or 0076 % (at your option) any later version. 0077 % 0078 % This program is distributed in the hope that it will be useful, 0079 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0080 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0081 % GNU General Public License for more details. 0082 % 0083 % You can obtain a copy of the GNU General Public License from 0084 % http://www.gnu.org/copyleft/gpl.html or by writing to 0085 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0086 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0087 [n,p]=size(x); % n = number of training values, p = dimension of data vector 0088 wn=ones(n,1); 0089 memsize=v_voicebox('memsize'); % set memory size to use 0090 if isempty(c) 0091 c=1/n^2; 0092 else 0093 c=c(1); % just to prevent legacy code failing 0094 end 0095 fulliv=0; % initial variance is not full 0096 if isempty(l) 0097 l=100+1e-4; % max loop count + stopping threshold 0098 end 0099 if nargin<5 || isempty(v0) || ischar(v0) % no initial values specified for m0, v0, w0 0100 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0101 % No initialvalues given, so we must use k-means or equivalent 0102 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0103 if nargin<6 0104 if nargin<5 || isempty(v0) 0105 v0='hf'; % default initialization mode: hf 0106 end 0107 wx=wn; % no data point weights 0108 else 0109 wx=w0(:); % data point weights 0110 end 0111 wx=wx/sum(wx); 0112 if any(v0=='m') 0113 k=size(m0,1); 0114 else 0115 k=m0; 0116 end 0117 fv=any(v0=='v'); % full covariance matrices requested 0118 mx0=wx'*x; % calculate mean of input data in each dimension 0119 vx0=wx'*x.^2-mx0.^2; % calculate variance of input data in each dimension 0120 sx0=sqrt(vx0); 0121 sx0(sx0==0)=1; % do not divide by zero when scaling 0122 if n<=k % each data point can have its own mixture 0123 xs=(x-mx0(wn,:))./sx0(wn,:); % scale the data 0124 m=xs(mod((1:k)-1,n)+1,:); % just include all points several times 0125 v=zeros(k,p); % will be set to floor later 0126 w=zeros(k,1); 0127 w(1:n)=1/n; 0128 if l>0 0129 l=0.1; % no point in iterating 0130 end 0131 else % more points than mixtures 0132 if any(v0=='s') 0133 xs=x; % do not scale data during initialization 0134 else 0135 xs=(x-mx0(wn,:))./sx0(wn,:); % else scale now 0136 if any(v0=='m') 0137 m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:); % scale specified means as well 0138 end 0139 end 0140 w=repmat(1/k,k,1); % all mixtures equally likely 0141 if any(v0=='k') % k-means initialization 0142 if any(v0=='m') 0143 [m,e,j]=v_kmeans(xs,k,m); 0144 elseif any(v0=='p') 0145 [m,e,j]=v_kmeans(xs,k,'p'); 0146 else 0147 [m,e,j]=v_kmeans(xs,k,'f'); 0148 end 0149 elseif any(v0=='h') % k-harmonic means initialization 0150 if any(v0=='m') 0151 [m,e,j]=v_kmeanhar(xs,k,[],4,m); 0152 else 0153 if any(v0=='p') 0154 [m,e,j]=v_kmeanhar(xs,k,[],4,'p'); 0155 else 0156 [m,e,j]=v_kmeanhar(xs,k,[],4,'f'); 0157 end 0158 end 0159 elseif any(v0=='p') % Initialize using a random partition 0160 j=ceil(rand(n,1)*k); % allocate to random clusters 0161 j(v_rnsubset(k,n))=1:k; % but force at least one point per cluster 0162 for i=1:k 0163 m(i,:)=mean(xs(j==i,:),1); 0164 end 0165 else 0166 if any(v0=='m') 0167 m=m0; % use specified centres 0168 else 0169 m=xs(v_rnsubset(k,n),:); % Forgy initialization: sample k centres without replacement [default] 0170 end 0171 [e,j]=v_kmeans(xs,k,m,0); % find out the cluster allocation 0172 end 0173 if any(v0=='s') 0174 xs=(x-mx0(wn,:))./sx0(wn,:); % scale data now if not done previously 0175 end 0176 v=zeros(k,p); % diagonal covariances 0177 w=zeros(k,1); 0178 for i=1:k 0179 ni=sum(j==i); % number assigned to this centre 0180 w(i)=(ni+1)/(n+k); % weight of this mixture 0181 if ni 0182 v(i,:)=sum((xs(j==i,:)-repmat(m(i,:),ni,1)).^2,1)/ni; 0183 else 0184 v(i,:)=zeros(1,p); 0185 end 0186 end 0187 end 0188 else 0189 %%%%%%%%%%%%%%%%%%%%%%%% 0190 % use initial values given as input parameters 0191 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0192 if nargin<7 0193 wx=wn; % no data point weights 0194 end 0195 wx=wx(:)/sum(wx); % normalize weights and force a column vector 0196 mx0=wx'*x; % calculate mean of input data in each dimension 0197 vx0=wx'*x.^2-mx0.^2; % calculate variance of input data in each dimension 0198 sx0=sqrt(vx0); 0199 sx0(sx0==0)=1; % do not divide by zero when scaling 0200 [k,p]=size(m0); 0201 xs=(x-mx0(wn,:))./sx0(wn,:); % scale the data 0202 m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:); % and the means 0203 v=v0; 0204 w=w0; 0205 fv=ndims(v)>2 || size(v,1)>k; % full covariance matrix is supplied 0206 if fv 0207 mk=eye(p)==0; % off-diagonal elements 0208 fulliv=any(v(repmat(mk,[1 1 k]))~=0); % check if any are non-zero 0209 if ~fulliv 0210 v=reshape(v(repmat(~mk,[1 1 k])),p,k)'./repmat(sx0.^2,k,1); % just pick out and scale the diagonal elements for now 0211 else 0212 v=v./repmat(sx0'*sx0,[1 1 k]); % scale the full covariance matrix 0213 end 0214 end 0215 end 0216 if length(wx)~=n 0217 error('%d datapoints but %d weights',n,length(wx)); 0218 end 0219 lsx=sum(log(sx0)); 0220 xsw=xs.*repmat(wx,1,p); % weighted data points 0221 if ~fulliv % initializing with diagonal covariance 0222 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0223 % Diagonal Covariance matrices % 0224 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0225 v=max(v,c); % apply the lower bound 0226 xs2=xs.^2.*repmat(wx,1,p); % square and weight the data for variance calculations 0227 0228 % If data size is large then do calculations in chunks 0229 0230 nb=min(n,max(1,floor(memsize/(8*p*k)))); % chunk size for testing data points 0231 nl=ceil(n/nb); % number of chunks 0232 jx0=n-(nl-1)*nb; % size of first chunk 0233 0234 im=repmat(1:k,1,nb); im=im(:); 0235 th=(l-floor(l))*n; 0236 sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values 0237 lp=floor(l)+sd; % extra loop needed to calculate final G value 0238 0239 lpx=zeros(1,n); % log probability of each data point 0240 wk=ones(k,1); 0241 wp=ones(1,p); 0242 wnb=ones(1,nb); 0243 wnj=ones(1,jx0); 0244 0245 % EM loop 0246 0247 g=0; % dummy initial value for comparison 0248 gg=zeros(lp+1,1); 0249 ss=sd; % initialize stopping count (0 or 1) 0250 for j=1:lp 0251 g1=g; % save previous log likelihood (2*pi factor omitted) 0252 m1=m; % save previous means, variances and weights 0253 v1=v; 0254 w1=w; 0255 vi=-0.5*v.^(-1); % data-independent scale factor in exponent 0256 lvm=log(w)-0.5*sum(log(v),2); % log of external scale factor (excluding -0.5*p*log(2pi) term) 0257 0258 % first do partial chunk (of length jx0) 0259 0260 jx=jx0; 0261 ii=1:jx; % indices of data points in this chunk 0262 kk=repmat(ii,k,1); % kk(jx,k): one row per data point, one column per mixture 0263 km=repmat(1:k,1,jx); % km(jx,k): one row per data point, one column per mixture 0264 py=reshape(sum((xs(kk(:),:)-m(km(:),:)).^2.*vi(km(:),:),2),k,jx)+lvm(:,wnj); % py(k,jx) pdf of each point with each mixture 0265 mx=max(py,[],1); % mx(1,jx) find normalizing factor for each data point to prevent underflow when using exp() 0266 px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint 0267 ps=sum(px,1); % total normalized likelihood of each data point 0268 px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1) 0269 lpx(ii)=log(ps)+mx; 0270 pk=px*wx(ii); % pk(k,1) effective fraction of data points for each mixture (could be zero due to underflow) 0271 sx=px*xsw(ii,:); 0272 sx2=px*xs2(ii,:); 0273 for il=2:nl % process the data points in chunks 0274 ix=jx+1; 0275 jx=jx+nb; % increment upper limit 0276 ii=ix:jx; % indices of data points in this chunk 0277 kk=repmat(ii,k,1); 0278 py=reshape(sum((xs(kk(:),:)-m(im,:)).^2.*vi(im,:),2),k,nb)+lvm(:,wnb); 0279 mx=max(py,[],1); % find normalizing factor for each data point to prevent underflow when using exp() 0280 px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint 0281 ps=sum(px,1); % total normalized likelihood of each data point 0282 px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1) 0283 lpx(ii)=log(ps)+mx; 0284 pk=pk+px*wx(ii); % pk(k,1) effective fraction of data points for each mixture (could be zero due to underflow) 0285 sx=sx+px*xsw(ii,:); 0286 sx2=sx2+px*xs2(ii,:); 0287 end 0288 g=lpx*wx; % total log probability summed over all data points 0289 gg(j)=g; % save log prob at each iteration 0290 w=pk; % normalize to get the weights 0291 if pk % if all elements of pk are non-zero 0292 m=sx./pk(:,wp); % calculate mixture means 0293 v=sx2./pk(:,wp); % and variances 0294 else 0295 wm=pk==0; % mask indicating mixtures with zero weights 0296 nz=sum(wm); % number of zero-weight mixtures 0297 [vv,mk]=sort(lpx); % find the lowest probability data points 0298 m=zeros(k,p); % initialize means and variances to zero (variances are floored later) 0299 v=m; 0300 m(wm,:)=xs(mk(1:nz),:); % set zero-weight mixture means to worst-fitted data points 0301 w(wm)=1/n; % set these weights non-zero 0302 w=w*n/(n+nz); % normalize so the weights sum to unity 0303 wm=~wm; % mask for non-zero weights 0304 m(wm,:)=sx(wm,:)./pk(wm,wp); % recalculate means and variances for mixtures with a non-zero weight 0305 v(wm,:)=sx2(wm,:)./pk(wm,wp); 0306 end 0307 v=max(v-m.^2,c); % apply floor to variances 0308 if g-g1<=th && j>1 0309 if ~ss, break; end % stop 0310 ss=ss-1; % stop next time 0311 end 0312 end 0313 if sd && ~fv % we need to calculate the final probabilities 0314 pp=lpx'-0.5*p*log(2*pi)-lsx; % log of total probability of each data point 0315 gg=gg(1:j)-0.5*p*log(2*pi)-lsx; % average log prob at each iteration 0316 g=gg(end); 0317 m=m1; % back up to previous iteration 0318 v=v1; 0319 w=w1; 0320 mm=sum(m,1)/k; 0321 f=(m(:)'*m(:)-k*mm(:)'*mm(:))/sum(v(:)); 0322 end 0323 if ~fv 0324 m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:); % unscale means 0325 v=v.*repmat(sx0.^2,k,1); % and variances 0326 else 0327 v1=v; 0328 v=zeros(p,p,k); 0329 mk=eye(p)==1; % mask for diagonal elements 0330 v(repmat(mk,[1 1 k]))=v1'; % set from v1 0331 end 0332 end 0333 if fv % check if full covariance matrices were requested 0334 %%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0335 % Full Covariance matrices % 0336 %%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0337 pl=p*(p+1)/2; 0338 lix=1:p^2; 0339 cix=repmat(1:p,p,1); 0340 rix=cix'; 0341 lix(cix>rix)=[]; % index of lower triangular elements 0342 cix=cix(lix); % index of lower triangular columns 0343 rix=rix(lix); % index of lower triangular rows 0344 dix=find(rix==cix); 0345 lixi=zeros(p,p); 0346 lixi(lix)=1:pl; 0347 lixi=lixi'; 0348 lixi(lix)=1:pl; % reverse index to build full matrices 0349 v=reshape(v,p^2,k); 0350 v=v(lix,:)'; % lower triangular in rows 0351 0352 % If data size is large then do calculations in chunks 0353 0354 nb=min(n,max(1,floor(memsize/(24*p*k)))); % chunk size for testing data points 0355 nl=ceil(n/nb); % number of chunks 0356 jx0=n-(nl-1)*nb; % size of first chunk 0357 % 0358 th=(l-floor(l))*n; 0359 sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values 0360 lp=floor(l)+sd; % extra loop needed to calculate final G value 0361 % 0362 lpx=zeros(1,n); % log probability of each data point 0363 wk=ones(k,1); 0364 wp=ones(1,p); 0365 wpl=ones(1,pl); % 1 index for lower triangular matrix 0366 wnb=ones(1,nb); 0367 wnj=ones(1,jx0); 0368 0369 % EM loop 0370 0371 g=0; % dummy initial value for comparison 0372 gg=zeros(lp+1,1); 0373 ss=sd; % initialize stopping count (0 or 1) 0374 vi=zeros(p*k,p); % stack of k inverse cov matrices each size p*p 0375 vim=zeros(p*k,1); % stack of k vectors of the form inv(v)*m 0376 mtk=vim; % stack of k vectors of the form m 0377 lvm=zeros(k,1); 0378 wpk=repmat((1:p)',k,1); 0379 for j=1:lp 0380 g1=g; % save previous log likelihood (2*pi factor omitted) 0381 m1=m; % save previous means, variances and weights 0382 v1=v; 0383 w1=w; 0384 for ik=1:k 0385 0386 % these lines added for debugging only 0387 % vk=reshape(v(k,lixi),p,p); 0388 % condk(ik)=cond(vk); 0389 %%%%%%%%%%%%%%%%%%%% 0390 [uvk,dvk]=eig(reshape(v(ik,lixi),p,p)); % convert lower triangular to full and find eigenvalues 0391 dvk=max(diag(dvk),c); % apply variance floor to eigenvalues 0392 vik=-0.5*uvk*diag(dvk.^(-1))*uvk'; % calculate inverse 0393 vi((ik-1)*p+(1:p),:)=vik; % vi contains all mixture inverses stacked on top of each other 0394 vim((ik-1)*p+(1:p))=vik*m(ik,:)'; % vim contains vi*m for all mixtures stacked on top of each other 0395 mtk((ik-1)*p+(1:p))=m(ik,:)'; % mtk contains all mixture means stacked on top of each other 0396 lvm(ik)=log(w(ik))-0.5*sum(log(dvk)); % vm contains the weighted sqrt of det(vi) for each mixture 0397 end 0398 % 0399 % % first do partial chunk 0400 % 0401 jx=jx0; 0402 ii=1:jx; 0403 xii=xs(ii,:).'; 0404 py=reshape(sum(reshape((vi*xii-vim(:,wnj)).*(xii(wpk,:)-mtk(:,wnj)),p,jx*k),1),k,jx)+lvm(:,wnj); 0405 mx=max(py,[],1); % find normalizing factor for each data point to prevent underflow when using exp() 0406 px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint 0407 ps=sum(px,1); % total normalized likelihood of each data point 0408 px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1) 0409 lpx(ii)=log(ps)+mx; 0410 pk=px*wx(ii); % effective fraction of data points for each mixture (could be zero due to underflow) 0411 sx=px*xsw(ii,:); 0412 sx2=px*(xsw(ii,rix).*xs(ii,cix)); % accumulator for variance calculation (lower tri cov matrix as a row) 0413 for il=2:nl 0414 ix=jx+1; 0415 jx=jx+nb; % increment upper limit 0416 ii=ix:jx; 0417 xii=xs(ii,:).'; 0418 py=reshape(sum(reshape((vi*xii-vim(:,wnb)).*(xii(wpk,:)-mtk(:,wnb)),p,nb*k),1),k,nb)+lvm(:,wnb); 0419 mx=max(py,[],1); % find normalizing factor for each data point to prevent underflow when using exp() 0420 px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint 0421 ps=sum(px,1); % total normalized likelihood of each data point 0422 px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1) 0423 lpx(ii)=log(ps)+mx; 0424 pk=pk+px*wx(ii); % effective fraction of data points for each mixture (could be zero due to underflow) 0425 sx=sx+px*xsw(ii,:); % accumulator for mean calculation 0426 sx2=sx2+px*(xsw(ii,rix).*xs(ii,cix)); % accumulator for variance calculation 0427 end 0428 g=lpx*wx; % total log probability summed over all data points 0429 gg(j)=g; % save convergence history 0430 w=pk; % w(k,1) normalize to get the column of weights 0431 if pk % if all elements of pk are non-zero 0432 m=sx./pk(:,wp); % find mean and mean square 0433 v=sx2./pk(:,wpl); 0434 else 0435 wm=pk==0; % mask indicating mixtures with zero weights 0436 nz=sum(wm); % number of zero-weight mixtures 0437 [vv,mk]=sort(lpx); % find the lowest probability data points 0438 m=zeros(k,p); % initialize means and variances to zero (variances are floored later) 0439 v=zeros(k,pl); 0440 m(wm,:)=xs(mk(1:nz),:); % set zero-weight mixture means to worst-fitted data points 0441 w(wm)=1/n; % set these weights non-zero 0442 w=w*n/(n+nz); % normalize so the weights sum to unity 0443 wm=~wm; % mask for non-zero weights 0444 m(wm,:)=sx(wm,:)./pk(wm,wp); % recalculate means and variances for mixtures with a non-zero weight 0445 v(wm,:)=sx2(wm,:)./pk(wm,wpl); 0446 end 0447 v=v-m(:,cix).*m(:,rix); % subtract off mean squared 0448 if g-g1<=th && j>1 0449 if ~ss, break; end % stop 0450 ss=ss-1; % stop next time 0451 end 0452 end 0453 if sd % we need to calculate the final probabilities 0454 pp=lpx'-0.5*p*log(2*pi)-lsx; % log of total probability of each data point 0455 gg=gg(1:j)-0.5*p*log(2*pi)-lsx; % average log prob at each iteration 0456 g=gg(end); 0457 % gg' % *** DEBUG ONLY *** 0458 m=m1; % back up to previous iteration 0459 v=zeros(p,p,k); % reserve spave for k full covariance matrices 0460 trv=0; % sum of variance matrix traces 0461 for ik=1:k % loop for each mixture to apply variance floor 0462 [uvk,dvk]=eig(reshape(v1(ik,lixi),p,p)); % convert lower triangular to full and find eigenvectors 0463 dvk=max(diag(dvk),c); % apply variance floor to eigenvalues 0464 v(:,:,ik)=uvk*diag(dvk)*uvk'; % reconstitute full matrix 0465 trv=trv+sum(dvk); % add trace to the sum 0466 end 0467 w=w1; 0468 mm=sum(m,1)/k; 0469 f=(m(:)'*m(:)-k*mm(:)'*mm(:))/trv; 0470 else 0471 v1=v; % lower triangular form 0472 v=zeros(p,p,k); % reserve spave for k full covariance matrices 0473 for ik=1:k % loop for each mixture to apply variance floor 0474 [uvk,dvk,]=eig(reshape(v1(ik,lixi),p,p)); % convert lower triangular to full and find eigenvectors 0475 dvk=max(diag(dvk),c); % apply variance floor 0476 v(:,:,ik)=uvk*diag(dvk)*uvk'; % reconstitute full matrix 0477 end 0478 end 0479 m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:); % unscale means 0480 v=v.*repmat(sx0'*sx0,[1 1 k]); 0481 end 0482 if l==0 % suppress the first three output arguments if l==0 0483 m=g; 0484 v=f; 0485 w=pp; 0486 end 0487