Various techniques and circuits are known in the prior art for speech synthesis and for speech recognition. Indeed, there is extensive literature describing numerous designs and products utilizing them. An excellent reference source is L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall 1978, though this area of technology is in a great state of motion and no all-inclusive reference exists.
These various prior approaches separate those two functions and require separate apparatus for speech synthesis and separate apparatus for speech analysis and recognition.
One prominent class of techniques for synthesis uses an excitation to drive a filter having resonances which are those of the formants of the speech signal to be synthesized. The excitation driving the filter simulates either the periodic pulses produced at the glottis for voiced sounds, or the turbulent noise produced at constrictions formed by the glottis or tongue for unvoiced sounds. Speech characteristics are formed from the interplay between this excitation and time-varying filter resonances. The resonances of natural speech change not only from one speaker to another but also from one instant to another for the same speaker, according to changes in the size and shape of the physical structures which produce these resonances, such as the cavities of the mouth and pharynx. Since the changes in both resonances and excitation are slow compared to the bandwidth of the speech signal, the use of the time-varying parameters to characterize the speech results in a savings in information for this type of synthesizer, making it attractive for low-cost synthesis or synthesis requiring only a low data rate.
A key feature of the above type of speech synthesizer is that a filter which has only a limited number of resonances (i.e., poles), and no antiresonances (i.e., zeros) may be employed to accurately reproduce a speech signal. In fact, antiresonances do occur in speech (due, for example, to the parallel resonance of the nasal cavity), but such antiresonances can either be modeled adequately by extra poles, or introduced into the excitation function which drives the filter. Thus it is possible to make use of an "all-pole" filter to model the behavior of the vocal tract for the purpose of speech synthesis.
The techniques of linear predictive coding (LPC) provide means for analyzing a speech signal to produce the appropriate filter coefficients for controlling an all-pole filter.
A speech analyzer may make use of the same "all-pole" model of the vocal tract which speech synthesizers use, but in reverse. If a speech analyzer can "remove" resonances in a speech signal by introducing antiresonances which cancel them out, then it can as a consequence derive features characterizing the resonances of the speech signal, which can then be used for the purpose of speech recognition by matching to similar features stored for a library of the words to be recognized.
LPC analysis techniques such as the autocorrelation or covariance methods can be used to derive parameters for such a filter; however adaptive LPC techniques (i.e., LPC techniques which function on a continuous or sample-by-sample basis, rather than on whole blocks of data) are also known. An adaptive filtering technique has the advantages that (1) large blocks of input data need not be stored, and (2) relatively simple feedback schemes can replace explicit numerical computation. Adaptive techniques are known under the names of adaptive array theory, correlation cancelling loops, the least meansquares algorithm, and most recently, adaptive LPC. The basic idea is that of an all-zero filter which adapts so as to remove correlations (due to resonances) in the incoming signal. The resonances are cancelled out so that what comes out the end of the filter is essentially the original excitation to the all-pole filter which produced the signal. Cancelling the resonances is mathematically equivalent to eliminating the correlation between the forward and backward prediction errors for each stage of the filter.
The aim of the control scheme in an adaptive LPC analyzer is to eliminate such correlation. This can be done by generating a correlation signal, either positive or negative, which is used as an error signal in a feedback loop so as to adjust the filter coefficients in the right direction. This technique is prior art and is known under a variety of names, such as the least mean-squares algorithm and the correlation cancelling loop. Otherwise stated, the idea is to generate an adjustment to the then-current PARCOR (i.e., partial correlation) coefficient which is proportional to the negative of the product of the forward residual out of the stage and the backward residual for that stage.
The same filter transfer function may be realized with a variety of filter forms. One such form which has desirable properties for both speech synthesis and analysis is the so-called "lattice" form. An example of the all-pole form for this filter is shown in FIG. 12, while the all-zero form is shown in FIG. 11. The all-pole lattice filter for synthesis shows relative insensivity to quantization of the filter coefficients, meaning that relatively coarse quantization may be used for a low-cost synthesizer with this form of filter. An adaptive all-zero analyzer of the lattice form is superior in speed of adaptation to other filter forms. In addition, the coefficients which it derives, the PARCOR coefficients, are orthogonal and may be used successfully in their original form for pattern matching in speech recognition, unlike the coefficients one gets from a direct form adaptive filter, for example.
A significant problem with adaptive lattice type filters for speech analysis circuits, however, is their size. In addition to needing a multiplicity of filter stages and other signal processing circuitry, such filters have needed extensive circuitry for adaptively generating the filter coefficients. This coefficient-generating circuitry has required multiplier circuits which occupy substantial area on an integrated circuit, in addition to the multipliers needed by the filter itself. Thus, so far as we are informed, no one has succeeded in producing an adaptive lattice-filter type speech analyzer on a single integrated circuit "chip" using LSI technology. Of course, it follows that no one has put both such an analyzer and a synthesizer on the same chip.