V_DISTISAR calculates the Itakura-Saito distance between AR coefficients D=(AR1,AR2,MODE) Inputs: AR1,AR2 AR coefficient sets to be compared. Each row contains a set of coefficients. AR1 and AR2 must have the same number of columns. MODE Character string selecting the following options: 'x' Calculate the full distance matrix from every row of AR1 to every row of AR2 'd' Calculate only the distance between corresponding rows of AR1 and AR2 The default is 'd' if AR1 and AR2 have the same number of rows otherwise 'x'. Output: D If MODE='d' then D is a column vector with the same number of rows as the shorter of AR1 and AR2. If MODE='x' then D is a matrix with the same number of rows as AR1 and the same number of columns as AR2'. The Itakura-Saito spectral distance is the average over +ve and -ve frequency of pf1/pf2 - log(pf1/pf2) - 1 = exp(v) - v - 1 where v=log(pf1/pf2) The Itakura-Saito distance is asymmetric: pf1>pf2 contributes more to the distance than pf2>pf1. A symmetrical version is the COSH distance: v_distchpf(x,y)=(v_distispf(x,y)+v_distispf(y,x))/2 The I-S distance can be expressed as ar2*toeplitz(lpcar2rr(ar1))*ar2' + log((ar1(1)/ar2(1)).^2) - 1 but this is not how we actually calculate it.
0001 function d=v_distisar(ar1,ar2,mode) 0002 %V_DISTISAR calculates the Itakura-Saito distance between AR coefficients D=(AR1,AR2,MODE) 0003 % 0004 % Inputs: AR1,AR2 AR coefficient sets to be compared. Each row contains a set of coefficients. 0005 % AR1 and AR2 must have the same number of columns. 0006 % 0007 % MODE Character string selecting the following options: 0008 % 'x' Calculate the full distance matrix from every row of AR1 to every row of AR2 0009 % 'd' Calculate only the distance between corresponding rows of AR1 and AR2 0010 % The default is 'd' if AR1 and AR2 have the same number of rows otherwise 'x'. 0011 % 0012 % Output: D If MODE='d' then D is a column vector with the same number of rows as the shorter of AR1 and AR2. 0013 % If MODE='x' then D is a matrix with the same number of rows as AR1 and the same number of columns as AR2'. 0014 % 0015 % The Itakura-Saito spectral distance is the average over +ve and -ve frequency of 0016 % 0017 % pf1/pf2 - log(pf1/pf2) - 1 = exp(v) - v - 1 where v=log(pf1/pf2) 0018 % 0019 % The Itakura-Saito distance is asymmetric: pf1>pf2 contributes more to the distance than pf2>pf1. 0020 % A symmetrical version is the COSH distance: v_distchpf(x,y)=(v_distispf(x,y)+v_distispf(y,x))/2 0021 % 0022 % The I-S distance can be expressed as ar2*toeplitz(lpcar2rr(ar1))*ar2' + log((ar1(1)/ar2(1)).^2) - 1 0023 % but this is not how we actually calculate it. 0024 0025 0026 % Since the power spectrum is the fourier transform of the autocorrelation, we can calculate 0027 % the average value of p1/p2 by taking the 0'th order term of the convolution of the autocorrelation 0028 % functions associated with p1 and 1/p2. Since 1/p2 corresponds to an FIR filter, this convolution is 0029 % a finite sum even though the autocorrelation function of p1 is infinite in extent. 0030 % The average value of log(pf1) is equal to log(ar1(1)^-2) where ar1(1) is the 0'th order AR coefficient. 0031 0032 % The Itakura-Saito distance can also be calculated directly from the power spectra; providing np is large 0033 % enough, the values of d0 and d1 in the following will be very similar: 0034 % 0035 % np=255; d0=v_distisar(ar1,ar2); d1=v_distispf(v_lpcar2pf(ar1,np),v_lpcar2pf(ar2,np)) 0036 % 0037 % Autocorrelation LPC analysis is equivalent to minimizing the Itakura-Saito difference between the 0038 % signal spectrum and that of the all-pole LPC filter, i.e. v_distispf(pf,v_lpcar2pf(ar0,np)). 0039 % Moreover, if ar0 is the LPC filter and ar is any other all-pole filter, the I-S distance has the 0040 % following additive property: 0041 % 0042 % v_distispf(pf,v_lpcar2pf(ar,np)) = v_distispf(pf,v_lpcar2pf(ar0,np)) + v_distisar(ar0,ar) 0043 0044 % Ref: A.H.Gray Jr and J.D.Markel, "Distance measures for speech processing", IEEE ASSP-24(5): 380-391, Oct 1976 0045 % L. Rabiner abd B-H Juang, "Fundamentals of Speech Recognition", Section 4.5, Prentice-Hall 1993, ISBN 0-13-015157-2 0046 % F.Itakura & S.Saito, "A statistical method for estimation of speech spectral density and formant frequencies", 0047 % Electronics & Communications in Japan, 53A: 36-43, 1970. 0048 0049 % Copyright (C) Mike Brookes 1997 0050 % Version: $Id: v_distisar.m 10865 2018-09-21 17:22:45Z dmb $ 0051 % 0052 % VOICEBOX is a MATLAB toolbox for speech processing. 0053 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0054 % 0055 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0056 % This program is free software; you can redistribute it and/or modify 0057 % it under the terms of the GNU General Public License as published by 0058 % the Free Software Foundation; either version 2 of the License, or 0059 % (at your option) any later version. 0060 % 0061 % This program is distributed in the hope that it will be useful, 0062 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0063 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0064 % GNU General Public License for more details. 0065 % 0066 % You can obtain a copy of the GNU General Public License from 0067 % http://www.gnu.org/copyleft/gpl.html or by writing to 0068 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0069 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0070 0071 [nf1,p1]=size(ar1); 0072 nf2=size(ar2,1); 0073 m2=v_lpcar2ra(ar2); 0074 m2(:,1)=m2(:,1)*0.5; 0075 if nargin<3 | isempty(mode) mode='0'; end 0076 if any(mode=='d') | (mode~='x' & nf1==nf2) 0077 nx=min(nf1,nf2); 0078 d=2*sum(v_lpcar2rr(ar1(1:nx,:)).*m2(1:nx,:),2)-log((ar2(1:nx,1)./ar1(1:nx,1)).^2)-1;; 0079 else 0080 d=2*v_lpcar2rr(ar1)*m2'-log((ar1(:,1).^(-1)*ar2(:,1)').^2)-1; 0081 end