This invention relates to a data converter for use in a speech synthesizer system, wherein encoded formant frequency data as received by the data converter is decoded and transformed or converted to reflection coefficients in real time. More specifically, the data converter is employed in a speech synthesizer system which generates speech from quantized reflection coefficients, the data converter including circuitry implementing a Taylor series type approximation in transforming encoded formant frequency data stored in memory to reflection coefficients in real time for utilization by the speech synthesizer so as to significantly reduce the operable bit rate normally required by the speech synthesizer to produce speech of acceptable quality when the speech data stored in memory is representative of reflection coefficients.
Speech synthesizers are known in the prior art. It is common for speech synthesizers to synthesize the human vocal tract by means of a digital filter, with reflection coefficients being utilized to control the characteristics of the digital filter. Examples include U.S. Pat. Nos. 3,975,587 and 4,058,676. While the utilization of reflection coefficients as filter controls will allow fairly accurate speech synthesis, the bit rates required are typically 2400-5000 bits per second. Recently, an integrated circuit device manufactured by Texas Instruments Incorporated of Dallas, Tex., demonstrated the ability to synthesize speech utilizing reflection coefficient-type data, at a rate of 1200 bits per second. The aforementioned device is disclosed in U.S. patent application Ser. No. 901,393, which was filed Apr. 28, 1978, now U.S. Pat. No. 4,209,836 issued June 24, 1980.
Reflection coefficient-type data can be derived by extensive mathematical analysis of certain formant frequencies and bandwidths of human speech. However, the analysis required is quite time consuming and is not suitable for real time calculation without the use of a high-level computer system. Therefore, although formant frequency data contains more inherent speech intelligence than reflection coefficient data, the inability to convert formant frequency data to reflection coefficient data on a real time basis has been an obstacle to low bit rate speech synthesis systems which utilize formant frequency data.
It is, therefore, one object of this invention to implement a low bit rate speech synthesizer system which utilizes reflection coefficient data.
It is another object of this invention to provide an improved apparatus for converting formant frequency data to reflection coefficient data, in real time.
In accordance with the present invention, a data converter is provided for use in a speech synthesizer system which relies upon quantized reflection coefficients for the generation of speech, wherein the data converter accepts encoded formant frequency speech data, decodes the formant frequency speech data, and transforms the decoded data into reflection coefficients in real time via circuitry implementing a Taylor series type approximation. The speech synthesizer of the system utilizes the reflection coefficients as derived from the encoded formant frequency data by the data converter in producing speech of acceptable quality while operating at a significantly reduced bit rate than that it would normally require when the digitized speech data stored in memory for use by the speech synthesizer is representative of reflection coefficients. The reduced bit rate operation is achievable because formant frequency data contains more speech intelligence for a comparable string of data bits than reflection coefficient data. Thus, the speech synthesizer utilizing quantized reflection coefficients to generate speech as disclosed in U.S. Pat. No. 4,209,836 which ordinarily operates at a rate of 1200 bits per second can be operated at the significantly reduced rate of approximately 300 bits per second when employing encoded formant frequency speech data and the data converter as constructed in accordance with the present invention. A bit sequence of approximately 300 bits per second, consisting of coded pitch, energy and formant center frequencies is decoded by the data converter and the formant center frequency data is transformed in real time into reflection coefficients which are then quantized and input to the speech synthesizer.
In another more specific aspect of the speech synthesis system, formant frequency data is encoded in memory for only the voiced speech regions and reflection coefficients data is encoded in memory for the unvoiced speech regions. The speech synthesis system reads the encoded bit sequence from memory and decodes it to obtain the speech synthesis filter parameters as needed. During voiced speech, the decoded formant center frequencies and bandwidths are transformed by the data converter into reflection coefficients, the conversion being effected through a table look-up transformation wherein values for each reflection coefficient are stored in a ROM table for a suitable number of combinations of the first three formant center frequencies. Linear interpolation is employed to approximate the reflection coefficients for formant center frequencies which are not included in the look-up table. The decoded unvoiced speech is already in the form of reflection coefficients and together with the converted formant center frequencies and bandwidths is processed as quantized reflection coefficients and input to the speech synthesizer for generating speech.