1. Field of the Invention
The present invention relates generally to signal generators, and more specifically to a digital speech synthesizer suitable for integration within a single semiconductor chip.
2. Description of the Prior Art
Speech synthesis is a well developed art in which an electrical speech signal, suitable for operating a telephone, a loud speaker, or other electrical transducer, and representing a spoken word or message, is generated by an electronic apparatus in response to an analog or a digital electrical control signal. The controlling signal represents a coded form of the speech signal and may be derived continuously from an analog speech signal by some form of electrical spectrum analysis, such as the channel vocoder, or by mathematical analysis of a sampled speech signal, such as linear prediction coding.
A practical aim of speech synthesis is to reduce the information content of the speech signal, either to reduce the bandwidth or the capacity of a channel transmitting the speech signal or to reduce the size of the apparatus used for storing the speech signal in the case of fixed-message systems, such as annunciators.
In most forms of speech synthesis, the synthesizer is an electrical model of the human vocal apparatus containing filters which represent the acoustic resonances of various cavities in the vocal tract, such as the pharynx and the mouth, and energy sources, such as an impulse generator representing the glottis in the case of vowel sounds, or a broadband noise generator representing a turbulent construction in the vocal tract for fricative sounds. By separating speech production into its component parts consisting of energy source (e.g., glottis) and spectral filter (vocal cavities) information required to recreate speech synthetically is reduced (a) because of the relatively slowly changing parameters of the component parts compared to the rapidly varying speech waveform, and (b) because the separate parameters of amplitude and period or energy spectrum of the source and resonances of the spectral filter can be specified compactly.
In older art, the filter bands of the channel vocoder or the variable resonators of the formant vocoder require discrete components and precise adjustments which are not amenable to the size and cost reductions, as well as improved reliability, of digital integrated circuits. More recent art employs the digital filter which is ideally suited to integration, and which may be used to implement the spectral shaping functions of the channel or formant vocoder or the more complex shaping functions of the linear prediction synthesizer. In addition, the art uses similar combinations of energy source and spectral shaping function to produce signals representing animal sounds, such as dog barks, machinery noises, and a variety of sound effects. The types of filters and in particular the digital filter, which may be used for the spectral shaping required in speech synthesis, are also used in current art for more general filtering applications, such as tone detection or selection, or in separating or isolating specified frequency bands from a broad signal spectrum.
Current art employs the digital filter in a configuration called the "direct filter" for linear prediction synthesis. While the direct form may be controlled by the coefficients derived from linear prediction analysis without mathematical conversion of the coefficients, it is well known that the direct filter requires highly accurate coefficients and intermediate data, and thus complex circuit implementation, if it is to remain stable. Another configuration, called the "lattice filter" uses coefficients similar to those derived by linear prediction analysis. This configuration retains stability with a lower order of accuracy in the coefficients and intermediate data. However, the lattice filter is not amenable to general-purpose filter formulations which may be specified by frequency and gain functions, such as the channel and formant vocoders and a variety of signal processing applications.
It is known in the art that a polynomial expression derived from linear prediction analysis of a speech waveform, typically of 12th or 14th order, may be resolved into second-order factors by well-known computation techniques, and thus, a cascade of six or seven second-order filter sections may be used to provide identical filtering to a 12th or 14th-order direct filter. It is also known in the art that cascaded second-order sections are markedly less sensitive to coefficient accuracy than the equivalent direct filter, thereby allowing equivalent performance with a smaller number of bits per coefficient, and consequently resulting in lower operating speed and smaller arithmetic elements in circuit integration of the filter, as well as reduced size of data storage for the synthesizer control signals.