Description of v

V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)

Usage: (1) seg=v_snrseg(s,r,fs);                  % s & r are noisy and clean signal
       (2) seg=v_snrseg(s,r,fs,'wz');             % no VAD or inerpolation used ['Vq' is default]
       (3) [seg,glo]=v_snrseg(s,r,fs,'Vq',0.03);  % 30 ms frames

 Inputs:    s  test signal
            r  reference signal
           fs  sample frequency (Hz)
            m  mode [default = 'Vq']
                 w = No VAD - use whole file
                 v = use sohn VAD to discard silent portions
                 V = use P.56-based VAD to discard silent portions [default]
                 a = A-weight the signals
                 b = weight signals by BS-468
                 q = use linear interpolation to remove delays +- 1 sample [default]
                 z = do not do any alignment
                 p = plot results
           tf  frame increment [0.01]

 Outputs: seg = Segmental SNR in dB
          glo = Global SNR in dB (typically 7 dB greater than SNR-seg)
          tc  = time at centre of each frame (seconds)
          snf = Segmental SNR in dB in each frame
          vf  = Boolean mask indicating frames that valid (from VAD)

 This function compares a noisy signal, S, with a clean reference, R, and
 computes the segemntal signal-to-noise ratio (SNR) in dB. The signals,
 which must be of the same length, are split into non-overlapping frames
 of length TF (default 10 ms) and the SNR of each frame in dB is calculated.
 The segmental SNR is the average of these values, i.e.
         SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2))
 where the mean is over frames and the sum runs over one particular frame.
 Two optional modifications can be made to this basic formula:

    (a) Frames are excluded if there is no significant energy in the R
        signal. The idea is to limit the calculation to frames in which
        speech is active. By default, the v_voicebox function "v_activlev" is
        used to detect the inactive frames (the 'V' mode option).

    (b) In each frame independently, the reference signal is shifted by up
        to +- 1 sample to find the alignment than minimizes the noise
        component (S-R)^2. This shifting accounts for small misalignments
        and/or sample frequency differences between the two signals. For
        larger shifts, you can use the v_voicebox function "v_sigalign".
        Accurate alignemnt is especially important at high SNR values.

 If no M argument is specified, both these modifications will be applied;
 this is equivalent to specifying M='Vq'.

0001 function [seg,glo,tc,snf,vf]=v_snrseg(s,r,fs,m,tf)
0002 %V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)
0003 %
0004 %Usage: (1) seg=v_snrseg(s,r,fs);                  % s & r are noisy and clean signal
0005 %       (2) seg=v_snrseg(s,r,fs,'wz');             % no VAD or inerpolation used ['Vq' is default]
0006 %       (3) [seg,glo]=v_snrseg(s,r,fs,'Vq',0.03);  % 30 ms frames
0007 %
0008 % Inputs:    s  test signal
0009 %            r  reference signal
0010 %           fs  sample frequency (Hz)
0011 %            m  mode [default = 'Vq']
0012 %                 w = No VAD - use whole file
0013 %                 v = use sohn VAD to discard silent portions
0014 %                 V = use P.56-based VAD to discard silent portions [default]
0015 %                 a = A-weight the signals
0016 %                 b = weight signals by BS-468
0017 %                 q = use linear interpolation to remove delays +- 1 sample [default]
0018 %                 z = do not do any alignment
0019 %                 p = plot results
0020 %           tf  frame increment [0.01]
0021 %
0022 % Outputs: seg = Segmental SNR in dB
0023 %          glo = Global SNR in dB (typically 7 dB greater than SNR-seg)
0024 %          tc  = time at centre of each frame (seconds)
0025 %          snf = Segmental SNR in dB in each frame
0026 %          vf  = Boolean mask indicating frames that valid (from VAD)
0027 %
0028 % This function compares a noisy signal, S, with a clean reference, R, and
0029 % computes the segemntal signal-to-noise ratio (SNR) in dB. The signals,
0030 % which must be of the same length, are split into non-overlapping frames
0031 % of length TF (default 10 ms) and the SNR of each frame in dB is calculated.
0032 % The segmental SNR is the average of these values, i.e.
0033 %         SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2))
0034 % where the mean is over frames and the sum runs over one particular frame.
0035 % Two optional modifications can be made to this basic formula:
0036 %
0037 %    (a) Frames are excluded if there is no significant energy in the R
0038 %        signal. The idea is to limit the calculation to frames in which
0039 %        speech is active. By default, the v_voicebox function "v_activlev" is
0040 %        used to detect the inactive frames (the 'V' mode option).
0041 %
0042 %    (b) In each frame independently, the reference signal is shifted by up
0043 %        to +- 1 sample to find the alignment than minimizes the noise
0044 %        component (S-R)^2. This shifting accounts for small misalignments
0045 %        and/or sample frequency differences between the two signals. For
0046 %        larger shifts, you can use the v_voicebox function "v_sigalign".
0047 %        Accurate alignemnt is especially important at high SNR values.
0048 %
0049 % If no M argument is specified, both these modifications will be applied;
0050 % this is equivalent to specifying M='Vq'.
0051 
0052 % Bugs/suggestions
0053 % (1) Optionally restrict the bandwidth to the smaller of the two
0054 %     bandwidths either with an extra parameter or automatically determined
0055 % (2) Optionally apply a gain to the s signal to maximize the SNR and
0056 %     output the gain as an additional parameter
0057 
0058 %      Copyright (C) Mike Brookes 2011
0059 %      Version: $Id: v_snrseg.m 10865 2018-09-21 17:22:45Z dmb $
0060 %
0061 %   VOICEBOX is a MATLAB toolbox for speech processing.
0062 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0063 %
0064 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0065 %   This program is free software; you can redistribute it and/or modify
0066 %   it under the terms of the GNU General Public License as published by
0067 %   the Free Software Foundation; either version 2 of the License, or
0068 %   (at your option) any later version.
0069 %
0070 %   This program is distributed in the hope that it will be useful,
0071 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0072 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0073 %   GNU General Public License for more details.
0074 %
0075 %   You can obtain a copy of the GNU General Public License from
0076 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0077 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0078 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0079 
0080 if nargin<4 || ~ischar(m)
0081     m='Vq';
0082 end
0083 if nargin<5 || ~numel(tf)
0084     tf=0.01;                                            % default frame length is 10 ms
0085 end
0086 snmax=100;                                              % clipping limit for SNR
0087 
0088 % filter the input signals if required
0089 
0090 if any(m=='a')                                          % A-weighting
0091     [b,a]=v_stdspectrum(2,'z',fs);                      % create bandpass filter
0092     s=filter(b,a,s);                                    % filter test signal
0093     r=filter(b,a,r);                                    % filter reference signal
0094 elseif any(m=='b')                                      %  BS-468 weighting
0095     [b,a]=v_stdspectrum(8,'z',fs);                      % create bandpass filter
0096     s=filter(b,a,s);                                    % filter test signal
0097     r=filter(b,a,r);                                    % filter reference signal
0098 end
0099 
0100 mq=~any(m=='z');                                        % flag indicating we need to do alignment
0101 nr=min(length(r), length(s));                           % signal length to align
0102 kf=round(tf*fs);                                        % length of frame in samples
0103 ifr=kf+mq:kf:nr-mq;                                     % ending sample of each frame (allowing +-mq at each end)
0104 ifl=ifr(end);                                           % end sample of last frame
0105 nf=numel(ifr);                                          % number of frames
0106 if mq                                                   % perform linear interpolation
0107     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0108     % Consider the linearly interpolated signal sm(i)=(1-am)*s(i)+am*s(i+1) where 0<=am<=1.     %
0109     % For each frame we calculate the value of am that minimizes the error sum((sm(i)-r(i))^2). %
0110     % The solution is am = sum((s(i)-r(i)).*(s(i)-s(i+1))) / sum((s(i)-s(i+1))^2).              %
0111     % Similarly, we calculate the optimum ap in sp(i)=(1-ap)*s(i)+ap*s(i-1) and then            %
0112     % finally, we select whichever of sm(i) and sp(i) with the lowest error.                    %
0113     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0114     ssm=reshape(s(2:ifl)-s(3:ifl+1),kf,nf);
0115     ssp=reshape(s(2:ifl)-s(1:ifl-1),kf,nf);
0116     sr=reshape(s(2:ifl)-r(2:ifl),kf,nf);                % enframed error: test-ref (one frame per column)
0117     am=min(max(sum(sr.*ssm,1)./sum(ssm.^2,1),0),1);     % optimum dist between s(2:ifl) and s(3:ifl+1)
0118     ap=min(max(sum(sr.*ssp,1)./sum(ssp.^2,1),0),1);     % optimum dist between s(2:ifl) and s(1:ifl-1)
0119     ef=min(sum((sr-repmat(am,kf,1).*ssm).^2,1),sum((sr-repmat(ap,kf,1).*ssp).^2,1)); % select the best for each frame
0120 else                                                    % no interpolation
0121     ef=sum(reshape((s(1:ifl)-r(1:ifl)).^2,kf,nf),1);    % sum of squared errors in each frame
0122 end
0123 rf=sum(reshape(r(mq+1:ifl).^2,kf,nf),1);                % sum of sqared reference signal in each frame
0124 em=ef==0;                                               % mask for zero noise frames
0125 rm=rf==0;                                               % mask for zero reference frames
0126 snf=10*log10((rf+rm)./(ef+em));
0127 snf(rm)=-snmax;
0128 snf(em)=snmax;
0129 
0130 % select the frames to include
0131 
0132 if any(m=='w')                                          % 'w' option: do not discard silent frames
0133     vf=true(1,nf);                                      % include all frames
0134 elseif any(m=='v')                                      % 'v' option: use Sohn VAD to discard silent frames
0135     vs=v_vadsohn(r,fs,'na');
0136     nvs=length(vs);
0137     [vss,vix]=sort([ifr'; vs(:,2)]);
0138     vjx=zeros(nvs+nf,5);
0139     vjx(vix,1)=(1:nvs+nf)';                             % sorted position
0140     vjx(1:nf,2)=vjx(1:nf,1)-(1:nf)';                    % prev VAD frame end (or 0 or nvs+1 if none)
0141     vjx(nf+1:end,2)=vjx(nf+1:end,1)-(1:nvs)';           % prev snr frame end (or 0 or nvs+1 if none)
0142     dvs=[vss(1)-mq; vss(2:end)-vss(1:end-1)];           % number of samples from previous frame boundary
0143     vjx(:,3)=dvs(vjx(:,1));                             % number of samples from previous frame boundary
0144     vjx(1:nf,4)=vs(min(1+vjx(1:nf,2),nvs),3);           % VAD result for samples between prev frame boundary and this one
0145     vjx(nf+1:end,4)=vs(:,3);                            % VAD result for samples between prev frame boundary and this one
0146     vjx(1:nf,5)=1:nf;                                   % SNR frame to accumulate into
0147     vjx(vjx(nf+1:end,2)>=nf,3)=0;                       % zap any VAD frame beyond the last snr frame
0148     vjx(nf+1:end,5)=min(vjx(nf+1:end,2)+1,nf);          % SNR frame to accumulate into
0149     vf=full(sparse(1,vjx(:,5),vjx(:,3).*vjx(:,4),1,nf))>kf/2; % accumulate into SNR frames and compare with threshold
0150 else                                                    % default is 'V': use P.56 VAD to discard silent frames
0151     [lev,af,fso,vad]=v_activlev(r,fs);                  % do VAD on reference signal
0152     vf=sum(reshape(vad(mq+1:ifl),kf,nf),1)>kf/2;        % find frames that are mostly active
0153     tc=((1:nf)*kf+(1-kf)/2)/fs;                         % time at centre of frame
0154 end
0155 seg=mean(snf(vf));
0156 glo=10*log10(sum(rf(vf))/sum(ef(vf)));
0157 if ~nargout || any (m=='p')
0158     subplot(312);                                       % plot reference signal
0159     plot((1:length(r))/fs,r);
0160     ylabel('Reference');
0161     axh(2)=gca;
0162     subplot(313);                                       % plot SNR + global and segmental means
0163     snv=snf;
0164     snv(~vf)=NaN;                                       % plot valid frames in blue
0165     snu=snf;
0166     snu(vf>0)=NaN;                                      % plot invalid frames in red
0167     plot([1 nr]/fs,[glo seg; glo seg],':k',tc,snv,'-b',tc,snu,'-r');
0168     ylabel('Frame SNR');
0169     xlabel('Time (s)');
0170     set(gca,'ylim',min([snv(:);snu(:)])+[-3 60]);       % restrict displayed SNR range
0171     axh(3)=gca;
0172     subplot(311);                                       % plot test signal
0173     plot((1:length(s))/fs,s);
0174     ylabel('Signal');
0175     title(sprintf('SNR = %.1f dB, SNR_{seg} = %.1f dB',glo,seg));
0176     axh(1)=gca;
0177     linkaxes(axh,'x');                                  % link time axes
0178     linkaxes(axh(1:2),'y');                             % link signal amplitude axes
0179     set(gca,'xlim',[1 nr]/fs);
0180 end

v_snrseg

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE