1. Field of the Invention
The present invention relates to a formant emphasis method of emphasizing the spectral peak (formant) of an input speech signal and attenuating the spectral valley of the input speech signal in a decoder in speech coding/decoding or a preprocessor in speech processing.
2. Description of the Related Art
A technique for highly efficiently coding a speech signal at a low bit rate is an important technique for efficient utilization of radio waves and a reduction in communication cost in mobile communications (e.g., an automobile telephone) and local area networks. A CELP (Code Excited Linear Prediction) scheme is known as a speech coding method capable of performing high-quality speech synthesis at a bit rate of 8 kbps or less. This CELP scheme was introduced by M. R. Schroeder and B. S. Atal, AT & T Bell Lab. in "Code-Excited Linear Prediction (CELP) High-Quality Speech at Very Low Bit Rates", Proc., ICASSP; 1985, pp. 937-939" (Reference 1) and has received a great deal of attention as a technique capable of synthesizing high-quality speech. A variety of examinations have been made for improvements in quality and a reduction in computation quantity. The quality degradation of synthesized speech is perceived at a very low bit rate of 8 kbps or less, and the quality is not yet satisfactory.
Under these circumstances, a technique for performing post-processing for emphasizing the spectral peak (formant) of synthesized speed and attenuating the spectral valley to improve subjective quality was reported by P. Kroon and B. S Atal, AT & T Bell Lab. in "Quantization Procedures for the Excitation in CELP Coders", Proc. ICASSP; 1987, pp. 1,649-1,652 (Reference 2). In Reference 2, an all-pole filter for multiplying a coefficient with an LPC coefficient (Linear Prediction Coding coefficient) sent from a decoder so as to moderate a spectrum envelope is used in post-processing to improve quality. This all-pole filter is expressed in a z transform domain defined by equation (1): ##EQU1## wherein A(z/.beta.) is expressed by equation (2) below: ##EQU2## (.alpha..sub.i : LPC coefficient, P: filter order, 0&lt;.beta.&lt;1)
An excessive spectral tilt is included in the synthesized speech in this all-pole filter Q1(z), and the synthesized sound becomes unclear. A formant emphasis filter which solves this problem is disclosed in Jpn. Pat. Appln. KOKAI Publication No. 64-13200 entitled "Improvement in Method of Compressing Digitally Coded Speech" (Reference 3). Reference 3 proposes a scheme for cascade-connecting a zero-pole filter arranged in consideration of spectral tilt compensation and a first-order bypass filter having fixed characteristics. A transfer function Q2(z) of this formant emphasis filter is expressed in z transform domain defined by equation (3) as follows: ##EQU3##
According to this formant emphasis filter, terms A(z/.beta.) and (1-.mu.z.sup.-1) act to compensate the excessive spectral tilt of term A(z/.beta.), so that the problem on the unclear synthesized sound can be solved. The filter order of the formant emphasis filter becomes the (2P+1)th order, and the processing quantity undesirably increases.
Another formant emphasis filter is disclosed in Jpn. Pat. Appln. KOKAI Publication No. 2-82710 entitled "Post-Processing Filter" (Reference 4). In Reference 4, a zero-pole filter in which a spectral tilt compensation item having a lower filter order is given as a numerator term. A transfer function Q3(z) of this formant emphasis filter is expressed in a z transform domain defined by equation (4) as follows: ##EQU4## (M and P: filter orders (M&lt;P), 0&lt;.beta.&lt;1)
Numerator term A.sup.(M) (z/.beta.) of equation (4) acts to compensate the spectral tilt. In this case, the processing quantity becomes small with a lower order M. The order M must be increased to some extent to sufficiently compensate the spectral tilt. If M=1, the formant emphasis filter still produces unclear synthesized speech.
The common problem of equations (3) and (4) is control of the filter coefficient of the formant emphasis filter by the fixed values .beta. and .gamma. or only the fixed value .beta.. The filter characteristics of the formant emphasis filter cannot be finely adjusted, and the sound quality improvement capability of the formant emphasis filter has limitations. In addition, since the fixed values .beta. and .gamma. are used to always control the formant emphasis filter, adaptive processing in which formant emphasis is performed at a given portion of input speech and another portion thereof is attenuated cannot be performed.
As described above, in the conventional formant emphasis filter described above, the synthesized speech becomes unclear in the all-pole filter defined by equation (1), and subjective quality is degraded. When the zero-pole filter is cascade-connected to the first-order bypass filter, as defined in equation (3), although unclearness of the synthesized sound is solved to improve the subjective quality, the processing quality undesirably increases. In the zero-pole filter defined in equation (4), when the processing quantity is decreased by setting the order M=1 of the numerator term, the spectral tilt cannot be sufficiently compensated, and unclearness of the synthesized sound is left unsolved.
Since the filter coefficient of each conventional formant emphasis filter is controlled by the fixed values .beta. and .gamma. or only the fixed value .gamma., the following problems are posed. That is, the filter cannot be finely adjusted, and the sound quality improvement capability of the formant emphasis filter has limitations. In addition, since the formant emphasis filter is always controlled using the fixed values .beta. and .gamma., adaptive processing in which formant emphasis is performed at a given portion of input speech and another portion thereof is attenuated cannot be performed.
Also, in a prior post filter, when the pitch period between the pitch harmonic peaks for voiced speech largely varies or is erroneously detected as double pitch or half pitch, the pitch harmonics of the decoded speech is turbulent. At this time, the pitch emphasis filter enhances the turbulence, so that the speech quality is extremely degraded.