1. Field of the Invention
The invention relates in general to an LSP (Line Spectrum Pair) speech synthesis device, and more particularly to a speech synthesis ASIC (Application Specific IC) based on an LSP scheme. LSP speech synthesis is based on an improved algorithm previously derived from PARCOR (Partial Correlation). It requires only 60% of the bit rate required for PARCOR synthesis and still maintains the same level of quality. According to the invention, it needs an LSP synthesis digital filter to perform operations of the algorithm. The LSP synthesis digital filter consists of only one serial shift multiplier, four serial adders, four multiplexers and some registers to perform the operations of the algorithm. In addition, the sampling rate needed to perform the operations is lower so that the needed area of the speech synthesis ASIC for data storage, for example, is lesser.
2. Description of the Related Art
In the past several years, semiconductor companies have developed many speech synthesis chips and have found a great number of applications for them, including, for example, toys, personal computers, car electronics, and home electronics. In these chips, the PARCOR algorithm of LPC (linear predictive coding) is widely used. The functions of LPC are described as follows:
A speech data output signal s(n) is extracted from an excitation signal e(n) through a digital filter having a transfer function H(z). That is to say: s(n)=H(z).times.e(n).
The transfer function of the filter H(z) can be described as: ##EQU1##
The linear predictive error ##EQU2## has coefficients {a.sub.i } called linear predictive coefficients. The parameter p is called the linear predictive order. In the time domain, the speech data signal s(n) can be described as follows: ##EQU3##
The speech data signal s(n) can be considered to be a linear combination of the past p speech data signal values s(n-i) and the excitation signal e(n). In LPC, the excitation signal e(n) is hite noise," and the coefficients {a.sub.i } and G represent speech data, wherein the coefficients {a.sub.i } are the frequency data and G is energy.
If the coefficients {a.sub.i } are directly encoded, then to ensure the stability of the filter, each of the coefficients will be more than 10 bits. That is to say, high precision of the coefficients {a.sub.i } is necessary. In fact, the PARCOR algorithm is widely used. The reflective coefficients {k.sub.i } of that algorithm represent frequency data. On the condition that .vertline.k.sub.i .vertline.&lt;1, the stability of the filter can be ensured and the bit number will be reduced. There is therefore a need in the widely used speech synthesis ASIC to lower the bit rate in order to form a smaller configuration chip.
The PARCOR analysis-synthesis method is superior to any other previously developed methods, but it has a lowest bit rate limit of 2400 bps. If the bit rate falls below this value, the synthesized voice rapidly becomes unclear and unnatural. The LSP method was thus investigated to maintain voice quality at smaller bit rates (Itakura, 1975). The PARCOR coefficients are essentially parameters operating in the time domain as are the auto-correlation coefficients, whereas the LSPs are parameters functioning in the frequency domain. Therefore, the LSP parameters are advantageous in that the distortion they produce is smaller than that of the PARCOR coefficients, even when they are roughly quantized and linearly interpolated.
Optimum coding of LSP parameters can be realized by means of the same subjective and objective evaluation methods used for PARCOR analysis-synthesis systems (Sugamura and Itakura, 1981). Experimental studies on quantization characteristics have confirmed that if the distribution range of LSP parameters is considered in the quantization, the same spectral distortion can be realized by roughly 80% of the quantization bit rate compared with the PARCOR systems. As for the interpolation characteristics, the interpolation distortion has been demonstrated as being maintainable. As the result of the combination of these two effects, the LSP method produces the same synthesized sound quality using only roughly 60% of the bit rate as compared with that needed employing the PARCOR method. (See "Digital Speech Processing Synthesis and Recognition," Sadaok; Furnin, ISBN 0-8247-7965-7, Page 126, 133.)