v_earnoise

PURPOSE ^

V_EARNOISE Add noise to simulate the hearing threshold of a listener [Y,X,V]=(S,FS,M,SPL)

SYNOPSIS ^

function [y,x,v]=v_earnoise(s,fs,m,spl)

DESCRIPTION ^

V_EARNOISE Add noise to simulate the hearing threshold of a listener [Y,X,V]=(S,FS,M,SPL)

 Usage: (1) y=v_earnoise(s,fs);            % scale the speech to 62.35 dB SPL, add "internal ear noise" and then filter
        (2) spl=62.35;                     % this code does the same but with explicit signal scaling
            x=10^(0.05*spl)*v_activlev(s,fs,'n')
            y=v_earnoise(x,fs,'u');
        (3) v_earnoise(s,fs);              % If outputs are omitted, a graph is plotted showing SNR spectrrum
        (4) y=v_earnoise(s,fs,[],50);      % scale the speech to 50 dB SPL instead of the default 62.35
        (5) y=v_earnoise(s,fs,'n',spl);    % Assume the input signal, s, has already been scaled to 0 dB (saves computation)

  Inputs:  s(n,c)  speech signal: n samples with one channel per column
           fs      sample frequency in Hz
           m       mode string as shown below [default '??']
                     'n' Input s has been normalized to 0 dB (e.g. with the 'n' option of v_activlev.m)
                     'u' Input s is already scaled correctly in SPL (so ignore the spl input argument)
           spl       target active speech level in db SPL [default: 62.35]

 Outputs:  y(n,c)  filtered speech signal with added noise which simulates the ear input signal
           x(n,c)  filtered input speech signal
           v(n,c)  noise added to filtered speech signal

 This function adds ficticious "internal ear noise" onto an audio signal to simulate the effects of the
 frequency-dependent hearing threshold of a normal listener. To avoid having to add very high noise
 levels at low and high frequencies, it instead filters the input signal by the inverse of the desired
 noise spectrum and then adds white noise with 0 dB power spectral density. The noise spectrum is taken
 from Table 1 of [1] (which derived it from [2]) and, at a particular frequency, equals the pure-tone
 hearing threshold minus 10*log10(R) where R is the critical ratio. The critical ratio, R, is the power
 of a pure tone divided by the power spectral density of a white noise that just masks it; this ratio is
 approximately independent of level.

 By default the input speech for the strongest channel is scaled to correspond to a normal speaking level
 at 1 metre from the lips (62.35 dB from [1]). The speech level at the centre of the listener's head  can
 alternatively be specified explicitly in dB SPL using the spl input parameter. For distant sources, the
 level should be reduced by 20*log10(dist) where dist is the distance in metres between the speaker's
 lips and the centre of the listener's head. The same scaling is used for all channels.

 This function assumes normal hearing; to account for hearing loss, use the 'u' option (as in usage
 example 2 above) and apply a filter to x that reduces the signal level by the hearing loss at each
 frequency. For example, if the hearing loss is 20 dB at all frequencies, then x should be multiplied by 0.1.

 Refs: [1]    ANSI. Methods for the calculation of the speech intelligibility index.
               ANSI Standard S3.5-1997 (R2007), American National Standards Institute, 1997.
       [2]    C. V. Pavlovic. Derivation of primary parameters and procedures for use in speech
               intelligibility predictions. J. Acoust. Soc. Amer., 82: 413–422, 1987.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [y,x,v]=v_earnoise(s,fs,m,spl)
0002 %V_EARNOISE Add noise to simulate the hearing threshold of a listener [Y,X,V]=(S,FS,M,SPL)
0003 %
0004 % Usage: (1) y=v_earnoise(s,fs);            % scale the speech to 62.35 dB SPL, add "internal ear noise" and then filter
0005 %        (2) spl=62.35;                     % this code does the same but with explicit signal scaling
0006 %            x=10^(0.05*spl)*v_activlev(s,fs,'n')
0007 %            y=v_earnoise(x,fs,'u');
0008 %        (3) v_earnoise(s,fs);              % If outputs are omitted, a graph is plotted showing SNR spectrrum
0009 %        (4) y=v_earnoise(s,fs,[],50);      % scale the speech to 50 dB SPL instead of the default 62.35
0010 %        (5) y=v_earnoise(s,fs,'n',spl);    % Assume the input signal, s, has already been scaled to 0 dB (saves computation)
0011 %
0012 %  Inputs:  s(n,c)  speech signal: n samples with one channel per column
0013 %           fs      sample frequency in Hz
0014 %           m       mode string as shown below [default '??']
0015 %                     'n' Input s has been normalized to 0 dB (e.g. with the 'n' option of v_activlev.m)
0016 %                     'u' Input s is already scaled correctly in SPL (so ignore the spl input argument)
0017 %           spl       target active speech level in db SPL [default: 62.35]
0018 %
0019 % Outputs:  y(n,c)  filtered speech signal with added noise which simulates the ear input signal
0020 %           x(n,c)  filtered input speech signal
0021 %           v(n,c)  noise added to filtered speech signal
0022 %
0023 % This function adds ficticious "internal ear noise" onto an audio signal to simulate the effects of the
0024 % frequency-dependent hearing threshold of a normal listener. To avoid having to add very high noise
0025 % levels at low and high frequencies, it instead filters the input signal by the inverse of the desired
0026 % noise spectrum and then adds white noise with 0 dB power spectral density. The noise spectrum is taken
0027 % from Table 1 of [1] (which derived it from [2]) and, at a particular frequency, equals the pure-tone
0028 % hearing threshold minus 10*log10(R) where R is the critical ratio. The critical ratio, R, is the power
0029 % of a pure tone divided by the power spectral density of a white noise that just masks it; this ratio is
0030 % approximately independent of level.
0031 %
0032 % By default the input speech for the strongest channel is scaled to correspond to a normal speaking level
0033 % at 1 metre from the lips (62.35 dB from [1]). The speech level at the centre of the listener's head  can
0034 % alternatively be specified explicitly in dB SPL using the spl input parameter. For distant sources, the
0035 % level should be reduced by 20*log10(dist) where dist is the distance in metres between the speaker's
0036 % lips and the centre of the listener's head. The same scaling is used for all channels.
0037 %
0038 % This function assumes normal hearing; to account for hearing loss, use the 'u' option (as in usage
0039 % example 2 above) and apply a filter to x that reduces the signal level by the hearing loss at each
0040 % frequency. For example, if the hearing loss is 20 dB at all frequencies, then x should be multiplied by 0.1.
0041 %
0042 % Refs: [1]    ANSI. Methods for the calculation of the speech intelligibility index.
0043 %               ANSI Standard S3.5-1997 (R2007), American National Standards Institute, 1997.
0044 %       [2]    C. V. Pavlovic. Derivation of primary parameters and procedures for use in speech
0045 %               intelligibility predictions. J. Acoust. Soc. Amer., 82: 413–422, 1987.
0046 %
0047 persistent fs0 a b
0048 if isempty(fs0) || fs~=fs0
0049     [b,a]=v_stdspectrum(7,'z',fs);              % inverse internal noise spectrum filter
0050     fs0=fs;
0051 end
0052 [ns,nc]=size(s);
0053 if nc>ns
0054     error('s input has more columns (channels) than rows (samples)');
0055 end
0056 if nargin<3 || isempty(m)
0057     m=' ';
0058 end
0059 if nargin<4 || isempty(spl)
0060     spl=62.35;
0061 end
0062 if any(m=='n') || any(m=='u')
0063     if any(m=='n')
0064         dboff=spl;
0065     else
0066         dboff=0;
0067     end
0068 else
0069     if nc>1
0070         sal=zeros(1,nc);
0071         for i=1:nc
0072             sal(i)=v_activlev(s(:,i),fs,'d');
0073         end
0074         dboff=spl-max(sal);               % gain to apply to speech signal in dB
0075     else
0076         dboff=spl-v_activlev(s,fs,'d');               % gain to apply to speech signal in dB
0077     end
0078 end
0079 x=10^(0.05*dboff)*filter(b,a,s);
0080 v=sqrt(0.5*fs)*randn(size(s)); % Add noise at 0 dB power spectral density
0081 y=x+v;
0082 if ~nargout
0083     nfft=2*round(10e-3*fs/2);                       % FFT length is even number approximately 5 ms long
0084     fax=(0:nfft/2)*fs/nfft; % frequency axis for plot
0085     win=hamming(nfft);
0086     sal=zeros(1,nc);
0087     for i=1:nc
0088         sal(i)=v_activlev(s(:,i),fs,'d')+dboff;
0089     end
0090     [salmax,imax]=max(sal);
0091     [salmax,af,fso,vad]=v_activlev(s(:,imax),fs,'d');          % Get VAD from highest power input sigal
0092     fvad=sum(v_enframe(vad,nfft,nfft/2),2)>nfft/2;  % frames with mostly speech in them
0093     minmax=[Inf -Inf];
0094     leg=cell(nc,1);
0095     gsnr=zeros(nc,1);
0096     cols='brgcmyk';
0097     for i=1:nc
0098         col=cols(1+mod(i-1,length(cols)));
0099         px=v_enframe(x(:,i),win,nfft/2,'sdp',fs);        % computer first half of PSD
0100         psxm=mean(px(fvad,:),1);
0101         psxmdb=db(psxm,'p');
0102         minmax=[min(minmax(1),min(psxmdb)) max(minmax(2),max(psxmdb))];
0103         gsnr(i)=db(mean(psxm),'p');
0104         semilogx(fax,psxmdb,[col '-']);
0105         hold on
0106         leg{i}=sprintf('Chan %d: %+.1f dB SPL',i,sal(i));
0107     end
0108     for i=1:nc
0109         col=cols(1+mod(i-1,length(cols)));
0110         semilogx(fax([2 end]),gsnr([i i]),[col '--']);
0111     end
0112     snrrange=60;
0113     ylim=[max(minmax(1),minmax(2)-snrrange) minmax(2)]*[1.05 -0.05; -0.05 1.05];
0114     set(gca,'ylim',ylim,'xlim',[100 fs/2]);
0115     if ylim(1)<0 && ylim(2)>0
0116         semilogx(fax([2 end]),[0 0],'k:');
0117     end
0118     hold off
0119     grid on;
0120     legend(leg,'location','best')
0121     xlabel(['Frequency (' v_xticksi 'Hz)']);
0122     ylabel('SNR (dB)')
0123     title('Hearing threshold equivalent SNR');
0124 end

Generated by m2html © 2003