Matlab routines for Linear Predictive Coding (LPC)


Return to voicebox home page


Data Format

All the LPC routines described in this section can process several frames together. Each frame corresponds to a single row of the data matrix; if there is only one frame then the matrix must be a row vector rather than a column vector.

LPC Analysis

Two routines are provided: lpcauto for autocorrelation analysis and lpccovar for covariance analysis.

The analysis order, p, denotes the number of poles in the resultant autoregressive filter. The appropriate value for p is typically 2+fs/1kHz where fs is the sample frequency. This expression assumes that sound takes about = ms to travel the length of the vocal tract.

Fancy versions of LPC

Although the analysis order must be the same for all frames, the individual frame lengths can vary; this allows pitch-synchronous analysis. It is also possible to restrict the analysis interval to particular segments of each frame to allow closed-phase analysis. For high pitched voices, the closed phases may be very short: it is possible in this case to combine the data from two or more consecutive cycles to give multi-cycle closed-phase analysis. To obtain reliable estimates of the AR coefficients must be based on at least 2 ms of data.

LPC Coefficient Representations

The coefficients generated by LPC analysis can be represented in many equivalent forms. Voicebox recognizes the coefficient sets listed below and denotes each with a two-letter mnemonic. The number of coefficients varies: for an analysis of order p there can be p, p+1, or p+2 coefficients. This is indicated in the table. The meaning of the coefficient sets is explained below with reference to the lossless tube model of speech production. Routines are provided to convert each representation to the other forms indicated: the routine that converts from representation xx to representation yy is called lpcxx2yy.

The routine lpcconv can be used to figure out the sequence of calls needed to convert between any pair of these representations.

Code Size Convert from Convert to Description
aa p+2 dl, rf ao, dl, rf The area coefficients represent the cross-sectional areas of the vocal tract segments. The areas are normalised so that aa(p+2), the effective area of the free space beyond the lips, is equal to 1. aa(1) is the area at the glottis and is usually near 0.
am (p+1)2 ar, rr An upper unit-triangular matrix containing the AR coefficients for all orders 0,...,p. This matrix is a diagonal multiple of the hermitian square root of the symmetric toeplitz matrix toeplitz(rr).
ao p+1 aa, rf rf The area ratios give the ratio of one tube segment to that of the following segment.
ar p+1 cc, im, ls, ra, rf, rr, zz am, cc, db, ff, im, ls, pf, pp, ra, rf, rr, zz The autoregressive coefficients or AR coefficients represent the transfer function from the output flow of the vocal tract to the input flow. The coefficients are usually normalised so that ar(1)=1.
cc p ar, pf, zz ar The complex cepstrum coefficients are actually real despite their name. They equal the inverse fourier transform of the log frequency response of the autoregressive filter. These coefficients do not include cc(0) which is the DC component of the log frequency response.
cw p pp zz The roots of the power spectrum polynomial pp. These are the, normally complex, values of cos(w) that make the power spectrum of the inverse filter equal to zero.
db p+1 ar pf The power spectrum of the AR filter expressed in decibels. The first and last elements of ff() are respectively the DC and nyquist terms.
dl p aa aa The discrete cosine transform of the log cross-sectional area function of the tube.
ff p+1 ar pf The complex frequency response of the AR filter. The first and last elements of ff() are respectively the DC and nyquist terms.
im p+1 ar ar The impulse response of the autoregressive filter.
is p+1 rf rf The inverse sine coefficients equal sin-1 of the reflection coefficients multiplied by 2/pi to force them to lie in the range +-1 for a stable filter.
la p+2 rf rf The log area coefficients are the log cross sectional areas of the vocal tract segments. la(p+2) is the log of the effective area of the free space beyond the lips and is normalised to 0.
lo p+1 rf rf The log area ratios give the log of the ratio of one tube segment to that of the following segment. These values are limited by the conversion routines to about +-14.5
ls p ar ar The line spectrum frequencies or line spectrum pairs are normalised frequencies in the range 0 to 0.5. A sharp peak in the AR filter response will give rise to a pair of line spectrum frequencies nearby the peak.
pf p+1 ar, db, ff, ra cc, rr The power spectrum of the AR filter. The first and last elements of ff() are respectively the DC and nyquist terms.
pp p+1 ar, ra cw The power spectrum polynomial coefficients. This polynomial gives the power spectrum of the all-zero inverse filter as a function of cos(w).
ra p+1 ar pf, pp, ar The autocorrelation coefficients of the inverse filter's impulse response (the inverse filter is an FIR filter).
rf p+1 aa, ar, is, la, lo aa, ao, ar, is, la, lo, rr The reflection coefficients give the relative amplitudes of the incident and reflected pressure waves at the junction between two tube segments. The direction of travel of the incident wave is from the glottis towards the lips. rf(1) is the reflection coefficient at the glottis and rf(p+1) is the reflection coefficient at the lips: both of these coefficients are normally close to 1. Reversing the order of the reflection coefficients leaves the tube transfer function unchanged.
rr p+1 ar, pf, rf am, ar The autocorrelation coefficients of the autoregressive filter's impulse response when extended to an infinite number of terms.
ss p zz zz The s-plane autoregressive poles are the roots of the AR coefficient polynomial mapped onto the s-plane and expressed in normalised Hz. If ss() is multiplied by the sample frequency, a formant with frequency f and bandwidth b will give an s-plane pole of approximately _b/2 1 jf.
zz p ar, cw, ss ar, cc, ss The z-plane autoregressive poles are the roots of the AR coefficient polynomial.