V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF) Usage: (1) seg=v_snrseg(s,r,fs); % s & r are noisy and clean signal (2) seg=v_snrseg(s,r,fs,'wz'); % no VAD or inerpolation used ['Vq' is default] (3) [seg,glo]=v_snrseg(s,r,fs,'Vq',0.03); % 30 ms frames Inputs: s test signal r reference signal fs sample frequency (Hz) m mode [default = 'Vq'] w = No VAD - use whole file v = use sohn VAD to discard silent portions V = use P.56-based VAD to discard silent portions [default] a = A-weight the signals b = weight signals by BS-468 q = use quadratic interpolation to remove delays +- 1 sample z = do not do any alignment p = plot results tf frame increment [0.01] Outputs: seg = Segmental SNR in dB glo = Global SNR in dB (typically 7 dB greater than SNR-seg) This function compares a noisy signal, S, with a clean reference, R, and computes the segemntal signal-to-noise ratio (SNR) in dB. The signals, which must be of the same length, are split into non-overlapping frames of length TF (default 10 ms) and the SNR of each frame in dB is calculated. The segmental SNR is the average of these values, i.e. SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2)) where the mean is over frames and the sum runs over one particular frame. Two optional modifications can be made to this basic formula: (a) Frames are excluded if there is no significant energy in the R signal. The idea is to limit the calculation to frames in which speech is active. By default, the v_voicebox function "v_activlev" is used to detect the inactive frames (the 'V' mode option). (b) In each frame independently, the reference signal is shifted by up to +- 1 sample to find the alignment than minimizes the noise component (S-R)^2. This shifting accounts for small misalignments and/or sample frequency differences between the two signals. For larger shifts, you can use the v_voicebox function "v_sigalign". Accurate alignemnt is especially important at high SNR values. If no M argument is specified, both these modifications will be applied; this is equivalent to specifying M='Vq'.
0001 function [seg,glo]=v_snrseg(s,r,fs,m,tf) 0002 %V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF) 0003 % 0004 %Usage: (1) seg=v_snrseg(s,r,fs); % s & r are noisy and clean signal 0005 % (2) seg=v_snrseg(s,r,fs,'wz'); % no VAD or inerpolation used ['Vq' is default] 0006 % (3) [seg,glo]=v_snrseg(s,r,fs,'Vq',0.03); % 30 ms frames 0007 % 0008 % Inputs: s test signal 0009 % r reference signal 0010 % fs sample frequency (Hz) 0011 % m mode [default = 'Vq'] 0012 % w = No VAD - use whole file 0013 % v = use sohn VAD to discard silent portions 0014 % V = use P.56-based VAD to discard silent portions [default] 0015 % a = A-weight the signals 0016 % b = weight signals by BS-468 0017 % q = use quadratic interpolation to remove delays +- 1 sample 0018 % z = do not do any alignment 0019 % p = plot results 0020 % tf frame increment [0.01] 0021 % 0022 % Outputs: seg = Segmental SNR in dB 0023 % glo = Global SNR in dB (typically 7 dB greater than SNR-seg) 0024 % 0025 % This function compares a noisy signal, S, with a clean reference, R, and 0026 % computes the segemntal signal-to-noise ratio (SNR) in dB. The signals, 0027 % which must be of the same length, are split into non-overlapping frames 0028 % of length TF (default 10 ms) and the SNR of each frame in dB is calculated. 0029 % The segmental SNR is the average of these values, i.e. 0030 % SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2)) 0031 % where the mean is over frames and the sum runs over one particular frame. 0032 % Two optional modifications can be made to this basic formula: 0033 % 0034 % (a) Frames are excluded if there is no significant energy in the R 0035 % signal. The idea is to limit the calculation to frames in which 0036 % speech is active. By default, the v_voicebox function "v_activlev" is 0037 % used to detect the inactive frames (the 'V' mode option). 0038 % 0039 % (b) In each frame independently, the reference signal is shifted by up 0040 % to +- 1 sample to find the alignment than minimizes the noise 0041 % component (S-R)^2. This shifting accounts for small misalignments 0042 % and/or sample frequency differences between the two signals. For 0043 % larger shifts, you can use the v_voicebox function "v_sigalign". 0044 % Accurate alignemnt is especially important at high SNR values. 0045 % 0046 % If no M argument is specified, both these modifications will be applied; 0047 % this is equivalent to specifying M='Vq'. 0048 0049 % Bugs/suggestions 0050 % (1) Optionally restrict the bandwidth to the smaller of the two 0051 % bandwidths either with an extra parameter or automatically determined 0052 0053 % Copyright (C) Mike Brookes 2011 0054 % Version: $Id: v_snrseg.m 10865 2018-09-21 17:22:45Z dmb $ 0055 % 0056 % VOICEBOX is a MATLAB toolbox for speech processing. 0057 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0058 % 0059 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0060 % This program is free software; you can redistribute it and/or modify 0061 % it under the terms of the GNU General Public License as published by 0062 % the Free Software Foundation; either version 2 of the License, or 0063 % (at your option) any later version. 0064 % 0065 % This program is distributed in the hope that it will be useful, 0066 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0067 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0068 % GNU General Public License for more details. 0069 % 0070 % You can obtain a copy of the GNU General Public License from 0071 % http://www.gnu.org/copyleft/gpl.html or by writing to 0072 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0073 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0074 0075 if nargin<4 || ~ischar(m) 0076 m='Vq'; 0077 end 0078 if nargin<5 || ~numel(tf) 0079 tf=0.01; % default frame length is 10 ms 0080 end 0081 snmax=100; % clipping limit for SNR 0082 0083 % filter the input signals if required 0084 0085 if any(m=='a') % A-weighting 0086 [b,a]=v_stdspectrum(2,'z',fs); 0087 s=filter(b,a,s); 0088 r=filter(b,a,r); 0089 elseif any(m=='b') % BS-468 weighting 0090 [b,a]=v_stdspectrum(8,'z',fs); 0091 s=filter(b,a,s); 0092 r=filter(b,a,r); 0093 end 0094 0095 mq=~any(m=='z'); 0096 nr=min(length(r), length(s)); 0097 kf=round(tf*fs); % length of frame in samples 0098 ifr=kf+mq:kf:nr-mq; % ending sample of each frame 0099 ifl=ifr(end); 0100 nf=numel(ifr); % number of frames 0101 if mq % perform interpolation 0102 ssm=reshape(s(2:ifl)-s(3:ifl+1),kf,nf); 0103 ssp=reshape(s(2:ifl)-s(1:ifl-1),kf,nf); 0104 sr=reshape(s(2:ifl)-r(2:ifl),kf,nf); 0105 am=min(max(sum(sr.*ssm,1)./sum(ssm.^2,1),0),1); %optimum dist between s(2:ifl) and s(3:ifl+1) 0106 ap=min(max(sum(sr.*ssp,1)./sum(ssp.^2,1),0),1); %optimum dist between s(2:ifl) and s(1:ifl-1) 0107 ef=min(sum((sr-repmat(am,kf,1).*ssm).^2,1),sum((sr-repmat(ap,kf,1).*ssp).^2,1)); % select the best for each frame 0108 else % no interpolation 0109 ef=sum(reshape((s(1:ifl)-r(1:ifl)).^2,kf,nf),1); 0110 end 0111 rf=sum(reshape(r(mq+1:ifl).^2,kf,nf),1); 0112 em=ef==0; % mask for zero noise frames 0113 rm=rf==0; % mask for zero reference frames 0114 snf=10*log10((rf+rm)./(ef+em)); 0115 snf(rm)=-snmax; 0116 snf(em)=snmax; 0117 0118 % select the frames to include 0119 0120 if any(m=='w') 0121 vf=true(1,nf); % include all frames 0122 elseif any(m=='v') 0123 vs=v_vadsohn(r,fs,'na'); 0124 nvs=length(vs); 0125 [vss,vix]=sort([ifr'; vs(:,2)]); 0126 vjx=zeros(nvs+nf,5); 0127 vjx(vix,1)=(1:nvs+nf)'; % sorted position 0128 vjx(1:nf,2)=vjx(1:nf,1)-(1:nf)'; % prev VAD frame end (or 0 or nvs+1 if none) 0129 vjx(nf+1:end,2)=vjx(nf+1:end,1)-(1:nvs)'; % prev snr frame end (or 0 or nvs+1 if none) 0130 dvs=[vss(1)-mq; vss(2:end)-vss(1:end-1)]; % number of samples from previous frame boundary 0131 vjx(:,3)=dvs(vjx(:,1)); % number of samples from previous frame boundary 0132 vjx(1:nf,4)=vs(min(1+vjx(1:nf,2),nvs),3); % VAD result for samples between prev frame boundary and this one 0133 vjx(nf+1:end,4)=vs(:,3); % VAD result for samples between prev frame boundary and this one 0134 vjx(1:nf,5)=1:nf; % SNR frame to accumulate into 0135 vjx(vjx(nf+1:end,2)>=nf,3)=0; % zap any VAD frame beyond the last snr frame 0136 vjx(nf+1:end,5)=min(vjx(nf+1:end,2)+1,nf); % SNR frame to accumulate into 0137 vf=full(sparse(1,vjx(:,5),vjx(:,3).*vjx(:,4),1,nf))>kf/2; % accumulate into SNR frames and compare with threshold 0138 else % default is 'V' 0139 [lev,af,fso,vad]=v_activlev(r,fs); % do VAD on reference signal 0140 vf=sum(reshape(vad(mq+1:ifl),kf,nf),1)>kf/2; % find frames that are mostly active 0141 end 0142 seg=mean(snf(vf)); 0143 glo=10*log10(sum(rf(vf))/sum(ef(vf))); 0144 0145 if ~nargout || any (m=='p') 0146 subplot(311); 0147 plot((1:length(s))/fs,s); 0148 ylabel('Signal'); 0149 title(sprintf('SNR = %.1f dB, SNR_{seg} = %.1f dB',glo,seg)); 0150 axh(1)=gca; 0151 subplot(312); 0152 plot((1:length(r))/fs,r); 0153 ylabel('Reference'); 0154 axh(2)=gca; 0155 subplot(313); 0156 snv=snf; 0157 snv(~vf)=NaN; 0158 snu=snf; 0159 snu(vf>0)=NaN; 0160 plot([1 nr]/fs,[glo seg; glo seg],':k',((1:nf)*kf+(1-kf)/2)/fs,snv,'-b',((1:nf)*kf+(1-kf)/2)/fs,snu,'-r'); 0161 ylabel('Frame SNR'); 0162 xlabel('Time (s)'); 0163 axh(3)=gca; 0164 linkaxes(axh,'x'); 0165 end