This invention relates to the generation of a compressed digital representation of a digital sample input typically representative of speech. The transformation utilizes a digital lattice filter approach which can be implemented on a single silicon chip. The filter has a single multiplier, preferably a M-stage pipeline multipler, and a single adder. The filter permits the input of voiced speech and the subsequent output of digital data so as to accomplish the analysis of human speech.
Several methods are currently being used and experimented with to digitize human speech. For example, pulse code modulation, differential pulse code modulation, adaptive predictive coding, delta modulation, channel vocoders, cepstrum vocoders, formant vocoders, voice excited vocoders, and linear predictive coding methods of speech digitalization are well known. These methods are briefly explained in "Voice Signals: Bit by Bit" (pages 28-34 in the Oct. 1973 issue of IEEE Spectrum).
Once the human speech is digitized, it is susceptible to being synthesized at a later desired time through the use of various electronics. Computer simulations of the various speech digitalization methods have generally shown that the linear predictive methods of digitizing speech can produce speech having greater voice naturalness than the previous vocoder systems (i.e. channel vocoders) and at a lower data rate than the pulse coded modulation systems. As the number of stages in the digital filter increases, the more natural the sound of the generated speech will be. A device which utilizes linear predictive coding in implementing a lattice filter is disclosed in U.S. Pat. No. 4,209,844 issued to Brantingham et al on June 24, 1980, and a speech synthesis system relying upon the selective connection of speech sound waveforms extracted from natural voice is disclosed in U.S. Pat. No. 3,892,919 issued to Ichikawa on July 1, 1975, both incorporated hereinto by reference.
Prior to any synthesis of speech, the proper data must be collected and formated. Perhaps the simplest and most commonly used method for obtaining the speech parameters is to analyze actual speech signals. In this approach, speech is recorded so as to allow a short time spectral analysis of the signal many times each second to obtain the appropriate spectral parameters as a function of time. A second analysis is then performed to determine the appropriate excitation parameters. This process decides if the speech is voiced or unvoiced and when it is voiced, the appropriate pitch values are computed. When the parameters controlling the synthesizer have been carefully determined, the resulting synthetic speech sounds identical to the original. For a good analysis of speech synthesis refer to the SAE Technical Paper Series No. 800197 entitled "Low Cost Voice Response Systems Based On Speech Synthesis" by Richard Wiggins given in Detroit on Feb. 25-29, 1980, incoporated hereinto by reference.
Other methods exist which do not utilize the linear predictive coding approach. The efficient representation of speech signals in terms of a small number of slowly varying parameters is a problem of considerable importance in speech research. Most methods for analyzing speech start by transforming the acoustic data into a spectral form by performing a short-time analysis on the speech wave. Although spectral analysis is a well known technique for stationary signals, its application to speech signals suffers from a number of serious limitations arising from the non-stationary as well as the quasi-periodic properties of the speech wave. For this reason, methods based on spectral analysis do not always provide an accurate description of the speech articulation.
Efficient speech analysis also provides for a recognition system in which a security method for operation is performable. Speech analysis permits voiced entry into a computer or other processor and thereby absolutely controls access. In such systems, the analysis parameters are matched with reference data in order to verify or reject the claimed identity of a speaker. A speech analysis system may also be used in a speech recognition system which permits, for example, the entry of data into a computer by means of voice.
It has been recognized that the used of linear prediction performs extremely well in the analysis of signals. Linear production frequently utilizes what is referred to as a lattice filter in which a moving average is computed through the use of multipliers, adders, and delays to result in a single output signal. For a good review of linear prediction analysis techniques refer to the article by John Makhoul entitled "Linear Prediction: A Tutorial Review" found in IEEE Vol. 63 pp. 561-580 (April 1975), incorporated hereinto by reference.
A handicap which has hindered the development of sequential linear prediction analysis is that for an N-stage filter, 2N additions, 2N multiplications, and N delay operations must be performed on each speech data sample, where N is the order of analysis. This limitation requires the use of a large digital computer or the pacing of the input so that analysis may be performed. Through the use of pacing, a lower computer is fed the sample data at a much lower rate than would be encountered in real time processing constraints. This technique reduces the speed of operation but increases the amount of memory required to buffer the sampled speech signal.
Accordingly, linear prediction has not been used by small computers or with portable units for analyzing speech, since this technique requires either the bulk and mass of a large computer or the unrealistic technique of pacing the input data.