We disclose a new method and apparatus for encoding and decoding signals and for performing high resolution spectral estimation. Many devices used in communications employ such devices for data compression, data transmission and for the analysis and processing of signals. The basic capabilities of the invention pertain to all areas of signal processing, especially for spectral analysis based on short data records or when increased resolution over desired frequency bands is required. One such filter frequently used in the art is the Linear Predictive Code (LPC) filter. Indeed, the use of LPC filters in devices for digital signal processing (see, e.g., U.S. Pat. Nos. 4,209,836 and 5,048,088 and D. Quarmby, Signal Processing Chips, Prentice Hall, 1994, and L. R. Rabiner, B. S. Atal, and J. L. Flanagan, Current methods of digital speech processing, Selected Topics in Signal Processing (S. Haykin, editor), Prentice Hall, 1989, 112-132) is pertinent prior art to the alternative which we shall disclose.
We now describe this available art, the difference between the disclosed invention and this prior art, and the principal advantages of the disclosed invention. FIG. 1 depicts the power spectrum of a sample signal, plotted in logarithmic scale.
We have used standard methods known to those of ordinary skill in the art to develop a 4th order LPC filter from a finite window of this signal. The power spectrum of this LPC filter is depicted in FIG. 2.
One disadvantage of the prior art LPC filter is that its power spectral density cannot match the xe2x80x9cvalleys,xe2x80x9d or xe2x80x9cnotches,xe2x80x9d in a power spectrum, or in a periodogram. For this reason encoding and decoding devices for signal transmission and processing which utilize LPC filter design result in a synthesized signal which is rather xe2x80x9cflat,xe2x80x9d reflecting the fact that the LPC filter is an xe2x80x9call-pole model.xe2x80x9d Indeed, in the signal and speech processing literature it is widely appreciated that regeneration of human speech requires the design of filters having zeros, without which the speech will sound flat or artificial; see, e.g., [C. G. Bell, H. Fuisaaki, J. M. Heinz, K. N. Stevons and A. S. House, Reduction of Speech Spectra by Analysis-by-Synthesis Techniques, J. Acoust. Soc. Am. 33 (1961), page 1726], [J. D. Markel and A. H. Gray, Linear Prediction of Speech, Springer Verlag, Berlin, 1976, pages 271-272], [L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, N.J., 1978, pages 105, 76-78]. Indeed, while all pole filters can reproduce much of human speech sounds, the acoustic theory teaches that nasals and fricatives require both zeros and poles [J. D. Markel and A. H. Gray, Linear Prediction of Speech, Springer Verlag, Berlin, 1976, pages 271-272], [L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, N.J., 1978, page 105]. This is related to the technical fact that the LPC filter only has poles and has no transmission zeros. To say that a filter has a transmission zero at a frequency xcex6 is to say that the filter, or corresponding circuit, will absorb damped periodic signals which oscillate at a frequency equal to the phase of xcex6 and with a damping factor equal to the modulus of xcex6. This is the well-known blocking property of transmission zeros of circuits, see for example [L. O. Chua, C. A. Desoer and E. S. Kuh, Linear and Nonlinear Circuits, McGraw-Hill, 1989, page 659]. This is reflected in the fact, illustrated in FIG. 2, that the power spectral density of the estimated LPC filter will not match the power spectrum at xe2x80x9cnotches,xe2x80x9d that is, frequencies where the observed signal is at its minimum power. Note that in the same figure the true power spectrum is indicated by a dotted line for comparison.
Another feature of linear predictive coding is that the LPC filter reproduces a random signal with the same statistical parameters (covariance sequence) estimated from the finite window of observed data. For longer windows of data this is an advantage of the LPC filter, but for short data records relatively few of the terms of the covariance sequence can be computed robustly. This is a limiting factor of any filter which is designed to match a window of covariance data. The method and apparatus we disclose here incorporates two features which are improvements over these prior art limitations: The ability to include xe2x80x9cnotchesxe2x80x9d in the power spectrum of the filter, and the design of a filter based instead on the more robust sequence of first covariance coefficients obtained by passing the observed signal through a bank of first order filters. The desired notches and the sequence of (first-order) covariance data uniquely determine the filter parameters. We refer to such a filter as a tunable high resolution estimator, or THREE filter, since the desired notches and the natural frequencies of the bank of first order filters are tunable. A choice of the natural frequencies of the bank of filters correspond to the choice of a band of frequencies within which one is most interested in the power spectrum, and can also be automatically tuned. FIG. 3 depicts the power spectrum estimated from a particular choice of 4th order THREE filter for the same data used to generate the LPC estimate depicted in FIG. 2, together with the true power spectrum, depicted in FIG. 1, which is marked with a dotted line.
We expect that this invention will have application as an alternative for the use of LPC filter design in other areas of signal processing and statistical prediction. In particular, many devices used in communications, radar, sonar and geophysical seismology contain a signal processing apparatus which embodies a method for estimating how the total power of a signal, or (stationary) data sequence, is distributed over frequency, given a finite record of the sequence. One common type of apparatus embodies spectral analysis methods which estimate or describe the signal as a sum of harmonics in additive noise [P. Stoica and R. Moses, Introduction to Spectral Analysis, Prentice-Hall, 1997, page 139]. Traditional methods for estimating such spectral lines are designed for either white noise or no noise at all and can illustrate the comparative effectiveness of THREE filters with respect to both non-parametric and parametric based spectral estimation methods for the problem of line spectral estimation. FIG. 4 depicts f ve runs of a signal comprised of the superposition of two sinusoids with colored noise, the number of sample points for each being 300. FIG. 5 depicts the five corresponding periodograms computed with state-of-the-art windowing technology. The smooth curve represents the true power spectrum of the colored noise, and the two vertical lines the position of the sinusoids.
FIG. 6 depicts the five corresponding power spectra obtained through LPC filter design, while FIG. 7 depicts the corresponding power spectra obtained through the THREE filter design. FIGS. 8, 9 and 10 show similar plots for power spectra estimated using state-of-the-art periodogram, LPC, and our invention, respectively. It is apparent that the invention disclosed herein is capable of resolving the two sinusoids, clearly delineating their position by the presence of two peaks. We also disclose that, even under ideal noise conditions the periodogram cannot resolve these two frequencies. In fact, the theory of spectral analysis [P. Stoica and R. Moses, Introduction to Spectral Analysis, Prentice-Hall, 1997, page 33] teaches that the separation of the sinusoids is smaller than the theoretically possible distance that can be resolved by the periodogram using a 300 point record under ideal noise conditions, conditions which are not satisfied here. This example represents a typical situation in applications.
The broader technology of the estimation of sinusoids in colored noise has been regarded as difficult [B. Porat, Digital Processing of Random Signals, Prentice-Hall, 1994, pages 285-286]. The estimation of sinusoids in colored noise using autoregressive moving-average filters, or ARMA models, is desirable in the art. As an ARMA filter, the THREE filter therefore possesses xe2x80x9csuper-resolutionxe2x80x9d capabilities [P. Stoica and R. Moses, Introduction to Spectral Analysis, Prentice-Hall, 1997, page 136].
We therefore disclose that the THREE filter design leads to a method and apparatus, which can be readily implemented in hardware or hardware/software with ordinary skill in the art of electronics, for spectral estimation of sinusoids in colored noise. This type of problem also includes time delay estimation [M. A. Hasan and M. R. Asimi-Sadjadi, Separation of multiple time delays in using new spectral estimation schemes, IEEE Transactions on Signal Processing 46 (1998), 2618-2630] and detection of harmonic sets [M. Zeytino{haeck over (g)}lu and K. M. Wong, Detection of harmonic sets, IEEE Transactions on Signal Processing 43 (1995), 2618-2630], such as in identification of submarines and aerospace vehicles. Indeed, those applications where tunable resolution of a THREE filter will be useful include radar and sonar signal analysis, and identification of spectral lines in doppler-based applications [P. Stoica and R. Moses, Introduction to Spectral Analysis, Prentice-Hall, 1997, page 248]. Other areas of potential importance include identification of formants in speech, data decimation [M. A. Hasan and M. R. Azimi-Sadjadi, Separation of multiple time delays using new spectral estimation schemes, IEEE Transactions on Signal Processing 46 (1998), 2618-2630], and nuclear magnetic resonance.
We also disclose that the basic invention could be used as a part of any system for speech compression and speech processing. In particular, in certain applications of speech analysis, such as speaker verification and speech recognition, high quality spectral analysis is needed [Joseph P. Campbell, Jr., Speaker Recognition: A tutorial, Proceedings of the IEEE 85 (1997), 1436-1463], [Jayant M. Naik, Speaker Verification: A tutorial, IEEE Communications Magazine, January 1990, 42-48], [Sadaoki Furui, Recent advances in Speaker Recognition, Lecture Notes in Computer Science 1206, 1997, 237-252], [Hiroaki Sakoe and Seibi Chiba, Dynamic Programming Altorithm Optimization for Spoken Word Recognition, IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-26 (1978), 43-49]. The tuning capabilities of the device should prove especially suitable for such applications. The same holds for analysis of biomedical signals such as EMG and EKG signals.