This invention relates to speech coding and more particularly to linear prediction speech pattern coders.
Linear predictive coding (LPC) is used extensively in digital speech transmission, speech recognition and speech synthesis systems which must operate at low bit rates. The efficiency of LPC arrangements results from the encoding of the speech information rather than the speech signal itself. The speech information corresponds to the shape of the vocal tract and its excitation and as is well known in the art, its bandwidth is substantially less than the bandwidth of the speech signal. The LPC coding technique partitions a speech pattern into a sequence of time frame intervals 5 to 20 millisecond in duration. The speech signal is quasi-stationary during such time intervals and may be characterized as a relatively simple vocal tract model specified by a small number of parameters. For each time frame, a set of linear predictive parameters are generated which are representative of the spectral content of the speech pattern. Such parameters may be applied to a linear filter which models the human vocal tract along with signals representative of the vocal tract excitation to reconstruct a replica of the speech pattern. A system illustrative of such an arrangement is described in U.S. Pat. No. 3,624,302 issued to B. S. Atal, Nov. 30, 1971, and assigned to the same assignee.
Vocal tract excitation for LPC speech coding and speech synthesis systems may take the form of pitch period signals for voiced speech, noise signals for unvoiced speech and a voiced-unvoiced signal corresponding to the type of speech in each successive LPC frame. While this excitation signal arrangement is sufficient to produce a replica of a speech pattern at relatively low bit rates, the resulting replica has limited quality. A significant improvement in speech quality is obtained by using a predictive residual excitation signal corresponding to the difference between the speech pattern of a frame and a speech pattern produced in response to the LPC parameters of the frame. The predictive residual, however, is noiselike since it corresponds to the unpredicted portion of the speech pattern. Consequently, a very high bit rate is needed for its representation. U.S. Pat. No. 3,631,520 issued to B. S. Atal, Dec. 28, 1971, and assigned to the same assignee discloses a speech coding system utilizing predictive residual excitation.
An arrangement that provides the high quality of predictive residual coding at a relatively low bit rate is disclosed in the copending application Ser. No. 326,371, filed by B. S. Atal et al on Dec. 1, 1981, now U.S. Pat. No. 4,472,382, and assigned to the same assignee and in the article, "A new model of LPC excitation for producing natural sounding speech at low bit rates," appearing in the Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Paris, France, 1982, pp. 614-617. As described therein, a signal corresponding to the speech pattern for a frame is generated as well as a signal representative of its LPC parameters responsive speech pattern for the frame. A prescribed format multipulse signal is formed for each successive LPC frame responsive to the differences between the frame speech pattern signal and the frame LPC derived speech pattern signal. Unlike the predictive residual excitation whose bit rate is not controlled, the bit rate of the multipulse excitation signal may be selected to conform to prescribed transmission and storage requirements. In contrast to the predictive vocoder type arrangement, intelligibility and naturalness is improved, partially voiced intervals are accurately encoded and classification of voiced and unvoiced speech intervals is eliminated.
While the aforementioned multipulse excitation provides high quality speech coding at relatively low bit rates, it is desirable to reduce the code bit rate further in order to provide greater economy. In particular, the reduced bit rate coding permits economic storage of vocabularies in speech synthesizers and more economical usage of transmission facilities. In pitch excited vocoders of the type described in aforementioned U.S. Pat. No. 3,624,302, the excitation bit rate is relatively low. Further reduction of total bit rate can be accomplished in voiced segments by repeating the spectral parameter signals from frame to frame since the excitation spectrum is independent of the spectral parameter signal spectrum.
Multipulse excitation utilizes a plurality of different value pulses for each time frame to achieve higher quality speech transmission. The multipulse excitation code corresponds to the predictive residual so that there is a complex interdependence between the predictive parameter spectra and excitation signal spectra. Thus, simple respacing of the multipulse excitation signal adversely affects the intelligibility of the speech pattern. Changes in speaking rate and inflections of a speech pattern may also be achieved by modifying the excitation and spectral parameter signals of the speech pattern frames. This is particularly important in applications where the speech is derived from written text and it is desirable to impart distinctive characteristics to the speech pattern that are different from the recorded coded speech elements.
It is an object of the invention to provide an improved predictive speech coding arrangement that produces high quality speech at a reduced bit rate. It is another object of the invention to provide an improved predictive coding arrangement adapted to modify the characteristics of speech messages.