1. Field of the Invention
The present invention relates to a voice signal coding apparatus, and more particularly to a voice signal coding apparatus for converting a voice signal into compressed digital information and recording or transmitting the resultant information.
2. Related Art Statement
One widely-used technique of compressing a voice signal in a highly efficient manner is to code the voice signal using a linear predictive parameter representing a spectral envelope and also using a sound source parameter corresponding to a residual linear predictive signal. If such a voice coding technique based on the linear prediction is used, it is possible to obtain a synthesized voice having relative high quality via a transmission channel with a rather small capacity. Because of the above advantage as well as recent advancement in hardware technology, there are intensive research and development activities on various applications in a wide range.
Among various techniques based on the linear prediction, a well-known technique is CELP (code excited linear predictive coding) disclosed in a paper entitled "Improved speech quality and efficient vector quantization in SELP" (Kleijin et al., ICASP' 88 s4.4, pp. 155-158, 1988) in which an adaptive code book obtained from a repetition of past sound source signals is used.
The voice signal coding apparatus based on the linear prediction analysis has the advantage that high-quality coding performance can be obtained at rather low bit rates. This type of voice signal coding apparatus using the linear prediction analysis is based on the assumption that voice generated by a human generally has the property of periodicity and thus it is generally possible to well analyze a voice signal if the length of one frame is set to about 20 ms.
The conventional voice signal coding apparatus however has the disadvantage that although high quality is obtained for voice signal periods, high-quality coding cannot be obtained for non-voice signal periods. In particular, great degradation in the voice quality occurs if there is background noise greater than a certain level.
To achieve more efficient compression, it is known in the art to employ a variable rate coding technique in which the bit rate is varied in accordance with the status of a given voice signal, It is also known to mix a high-efficiency voice signal coding technique with a non-voice signal compression technique as disclosed, for example, in Japanese Examined Patent Publication No. 2-35996.
In the technique disclosed in Japanese Examined Patent Publication No. 2-35996, however, coding is performed in extremely different ways depending on whether an input signal is a voice signal or a non-voice signal, and thus reproduced sound becomes very unnatural at a transition between voice and non-voice periods.
The voice signal coding apparatus is thought of as having applications in a mobile telephone, a voice recording apparatus, etc. In these applications, the voice signal coding apparatus is expected to be used in various environments wherein there is background noise in many cases. Therefore, the problem of voice quality degradation has to be solved to realize a more attractive product.
In view of the above, the inventor of the present invention has proposed a high-performance voice signal coding apparatus capable of always providing high sound quality even regardless of whether the signal is a voice signal or a non-voice signal, as disclosed in Japanese Patent Application No. 7-268756. This coding apparatus includes: voice status detecting means for detecting whether an input signal divided into predetermined frame intervals is a voice signal or a non-voice signal; linear predictive analyzing means for outputting a spectrum parameter associated with the input signal; control means for controlling the linear predictive analyzing means such that when the detection result by the voice status detecting means indicates that the input signal is a non-voice signal over a predetermined number of successive frames, the linear predictive analyzing means continuously outputs the spectrum parameter employed for the predetermined number of previous frames as a spectrum parameter for the input signal; driving sound signal generating means for generating a driving sound source signal corresponding to a residual linear predictive signal; and a synthesizing filter for synthesizing a voice from the driving sound source signal in accordance with the spectrum parameter.
In the above technique proposed in Japanese Patent Application No. 7-268756, however, although it is possible to suppress the sound quality degradation which occurs when the spectrum parameter is switched at transition between voice and non-voice periods, sound quality degradation still occurs and no improvement is obtained if a non-voice signal continues over a long period.
One conventional technique to achieve a higher efficiency in compression of voice data is to mix a high-efficiency voice signal coding technique with a non-voice signal compression technique. One well-known non-voice signal compression technique is a technique called VAD (voice activity detection) in which it is judged whether a given input signal is a voice signal or a non-voice signal and recording on a recording medium or data transmission is stopped if the judgement indicates that the input signal is a non-voice signal.
Another well-known technique is the variable-rate voice signal coding technique in which the bit rate is varied depending on the status of an input signal.
A specific example of the technique is disclosed in a paper entitled "QCELP: The North American CDMA Digital Cellular Variable Rate Speech Coding Standard," (A. DeJaco, W. Gardner, P. Jacobs, and C. Lee, Proceedings IEEE Workshop on Speech Coding for Telecommunications, pp. 5-6, 1993).
In this technique, the threshold value is adapted over a wide range from an extremely low background noise level to a rather high background noise level by gradually increasing the threshold value starting from a small value thereby ensuring that the status of the input signal can be accurately detected regardless of ingress of background noise.
However, in the above technique, the time required for the detection means to reach a state in which the status of a given input voice signal can be correctly detected increases with the input signal level or the background noise level, and it is impossible to obtain a desirable coding efficiency before the detection means reaches the above state.