In recent years, a code excited linear prediction (CELP) system as a voice analysis/synthesis method and a conjugate-structure algebraic-code-excited linear prediction system (CS-ACELP) are being used in voice coding processing performed in a voice coder.
In a CS-ACELP system, in accordance with ITU-T Recommendation G.729, an excitation pulse is successively passed through a short-term synthesis filter and a long-term synthesis filter, and the position and the polarity of the pulse, which can provide a decoded voice closest to the input signal, are coded and transmitted.
In the silence suppression, a voice coding apparatus is provided where the coding system is combined with a voice detector to transmit only coded data during the speech period. The non-coincidence of the internal state between the voice coding side and the voice decoding side is created in a portion where the no-voice state is changed to the voice state. This poses a problem in that the voice quality is deteriorated at the beginning of the speech period. Voice coding/decoding systems have been proposed in order to solve this problem.
For example, a first conventional voice coding/decoding system interrupts the operation of the coder and the decoder during a silent period during speech, for example. The operation of the coder and the decoder is resumed simultaneously with the initiation of a speech period. This permits the internal state on the voice coding side to be coincident with the internal state on the voice decoding side. As a result, the deterioration of the quality of the voice is reduced. (See, for example, Japanese Patent Laid-Open Nos. 064235/1991 and 272850/1990).
A second conventional coding/decoding system is such that the same object as described above is attained by refuging a delay element of a coding filter and a decoding filter during the silent period in a memory and loading the delay element from the memory at the beginning of the speech. (See, for example, Japanese Patent Laid-Open No. 0210845/1991).
A third conventional coding/decoding system resets or initializes a coder and a decoder each to a specified value in the silent period to provide coincidence in an internal state at the beginning of the speech, thereby preventing deterioration of the voice (see, for example, 292121/1993, 167635/1992, and 244935/1990).
The above described conventional voice coding/decoding systems have the following problems. According to the first described conventional coding/decoding system, the operation of the coder and the decoder is interrupted during the silence period of speech rendering the internal state on the voice coding side and the internal state on the voice decoding side coincident with each other. According to the second conventional coding/decoding system, the internal state at the time of switching from a speech period to a silence period is saved in a memory to render the internal state on the voice coding side and the internal state on the voice decoding side coincident with each other. In the first and second voice coding/decoding systems, input of the voice initiates the voice state initiating the original coding process and the decoding process. In this case the internal state is not smoothly transited, since there is no correlation between, the internal state in the coding and the decoding obtained from the input voice, and the held internal state, resulting in deteriorated voice quality.
In particular, when the first and second voice coding/decoding systems are applied to a coding system, comprising a combination of a short-term predictive filter and a long-term predictive filter (corresponding to a short-term synthesis filter and a long-term synthesis filter on the decoding side), adopted in recent highly efficient voice coding systems, (such as CS-ACELP), no significant deterioration in voice quality due to a relatively short impulse response in the internal state of the short-term predictive filter is apparent.
However, the impulse response of the long-term predictive filter is considerably longer such that a significant amount of time is taken during a period when the speech period is initiated. In this case, the held internal state is used as an initial value. In addition, the impulse response concludes with the internal state of the original coding/decoding processing. This poses a problem of a significant deterioration in voice quality until the impulse response is concluded.
The long-term predictive filter utilizes the periodicity of a stationary portion in a vowel during speech. In this case, a satisfactory effect can be expected in the stationary portion associated with a vowel. On the other hand, the effect of a prediction in the no-voice/silence portion is unknown. As a result the predictive gain approaches 0 (zero).
Therefore, when the conventional first or second method is applied to the long-term predictive filter, having the above characteristics, the initial value of the long-term predictive filter in the speech initiation portion has an unfavorable value corresponding to the stationary portion associated with a vowel, or the like.
According to the third conventional coding/decoding system, during the silence period, the coder and the decoder are reset or initialized to a specified value to achieve coincidence in the internal state at the beginning of speech.
As described above, however, input of the voice initiates the voice state and the original coding and decoding process. In addition, there is no correlation between the internal state in the coding and decoding obtained from the input voice and the internal state of the initial value. Furthermore, the internal state is not smoothly transited resulting in a deteriorated voice quality.
As described above, in the coding system, comprising a combination of a short-term predictive filter and a long-term predictive filter (corresponding to a short-term synthesis filter and a long-term synthesis filter on the decoding side), adopted in a highly efficient voice coding system, such as CS-ACELP, effective coding is executed at the beginning of speech depending upon the predictive gain of the short-term predictive filter.
On the other hand, the long-term predictive filter cannot be operated to develop the predictive filter effective unless the long-term predictive filter is initiated from a predictive gain of 0 (zero) and the input signal is gradually transited to a stationary voice signal.
For this reason, application of the third coding/decoding system to a coding system comprising a short-term predictive filter and a long-term predictive filter is useful in the long-term predictive filter in the speech initiation portion where the effect cannot be originally expected. According to the third coding/decoding system, however, the expected effect of the short-term predictive filter cannot be attained. As a result, voice quality is deteriorated.
Therefore, even though the voice coding/decoding systems are operated effectively in a silence suppression, voice coding/decoding system comprising a coding system relying upon short-term prediction alone, such as ADPCM (adaptive differential PCM) or APC (adaptive predictive coding), and a voice activity detector, combined with a recent coding system comprising a short-term prediction and long-term prediction to enhance the coding efficiency, unfavorably results in deteriorated voice quality in the speech initiation portion.