1. Field of the Invention
The present invention relates to a method and an apparatus for high-quality coding or decoding not only of a wideband speech signal but also of a narrowband speech signal.
2. Description of the Related Art
In digital transmission of speech signals for use in conventional cellular phone communication or voice over internet protocol (VoIP) communication, the speech signals have heretofore been sampled at a sampling frequency (or sampling rate) of 8 kHz, and coded and transmitted by a coding system adapted to the sampling rate. As known from the sampling theorem, signals sampled at a sampling rate of 8 kHz do not include frequencies which are more than 4 kHz, which corresponds to half the sampling frequency. In this manner in the field of speech coding, a speech signal in which frequencies of 4 kHz or more are not included is referred to as narrowband speech (or telephone band speech).
A system adapted to narrowband speech is used in coding/decoding the narrowband speech. For example, G.729 which is an international standard in ITU-T, or an adaptive multirate-narrowband (AMR-NB) which is a 3GPP standard is a speech coding/decoding system for narrowband, and the sampling rate for the input speech signal is defined as 8 kHz.
On the other hand, by use of a speech signal having a higher sampling rate of about 16 kHz, it is possible to represent speech including a wide frequency band of about 50 Hz to 7 kHz. In the field of speech coding, a speech signal represented using a sampling frequency which is sufficiently higher than 8 kHz in this manner (the frequency is usually about 16 kHz, but there is also a sampling frequency of about 12.8 kHz or 16 kHz or more depending on the situation) is referred to as a wideband speech. A wideband speech coding system which is different from a usual narrowband speech coding system and which is adapted to wideband speech is used in order to code this wideband speech.
For example, G.722.2 which is an international standard in ITU-T is an coding/decoding system for wideband speech, and the sampling frequency of the speech signal input into a coder and the sampling frequency of the speech signal output from a decoder are both defined as 16 kHz. The wideband speech coding system described in G.722.2 is referred to as the Adaptive Multi-rate Wideband (AMR-WB) system, and its objective is to encode/decode the wideband speech signal having a sampling frequency of 16 kHz with high quality. Nine bit rates are usable in AMR-WB. In general, the quality of the speech produced by performing the coding and decoding at a high bit rate is comparatively good, but the speech produced by performing the coding and decoding at a low bit rate has a large coding distortion, and speech quality therefore tends to deteriorate.
In this wideband speech coding system described in ITU-T Recommendation G.722.2 (AMR-WB) in this manner, the coding and the decoding are performed assuming that a wideband speech signal having a bandwidth of 50 Hz to 7 kHz is handled. Therefore, the sampling frequencies of the input signal of the coding and the output signal of the decoding are set to 16 kHz.
However, in a system in which a narrowband speech communication system to handle a speech signal that does not have a frequency of 4 kHz or more as in a usual telephone speech coexists with the wideband speech communication system, there occurs a case where the narrowband speech signal is handled in the wideband speech communication system. In this case, coded data produced by coding the narrowband speech signal by the wideband speech coding is decoded by the wideband speech decoding corresponding to the wideband speech coding. In this case, the speech signal to be decoded is decoded in the same process as that of a usual wideband speech signal.
Therefore, although the sampling frequency is for the wideband signal, it is expected that the narrowband speech signal seldom having frequency components of 4 kHz or more even when decoded is reconstructed, because the narrowband speech signal that does not have the frequency of 4 kHz or more is originally encoded. Provisionally, when there is distortion by the coding, or a band expansion process or the like in a decoding process, even the narrowband speech signal has a certain degree of frequency components of 4 kHz or more when encoded/decoded.
Thus, when transmitting the narrowband speech signal that does not have the frequency of 4 kHz or more in the conventional wideband coding system, the speech is encoded by the wideband speech coding on the transmission side and decoded using usual wideband speech decoding also on the reception side. In the conventional system represented by AMR-WB, the coding and the decoding are specialized for the wideband speech signal.
Accordingly, even the coded data which produces the narrowband speech signal seldom having the frequency of 4 kHz or more is subjected to the decoding specialized for the wideband speech signal, and therefore there is a problem that the quality of the produced narrowband speech signal deteriorates. This tendency is especially remarkable at the low bit rate at which high compression efficiency is required.
Therefore, for example, when using wideband speech coding/decoding with respect to a narrowband speech signal whose band is limited by the use of, for example, a narrowband communication path/storage system, or narrowband codec, there is a problem that the speech quality is remarkably degraded at the low bit rate of around 6 to 10 kbit/sec as compared with the use of the narrowband speech coding/decoding. This is not limited to a narrowband speech signal, and a similar problem lies in handling a speech signal having very little frequency of more than 4 kHz, and there has heretofore been a problem that high-quality speech cannot be provided at a low bit rate in conventional wideband speech decoding.
Moreover, in the conventional AMR-WB system, a wideband speech decoding unit comprises a lower-band section (to produce the lower-band speech signal less than or equal to about 6 kHz), and a higher-band section (to produce the higher band speech signal about 6 kHz to 7 kHz). The lower-band section is a CELP-based speech coding system, and a higher band speech signal produced in the higher-band section is constantly added to the lower-band speech signal produced by decoding in the lower-band section to produce an output signal of the wideband speech decoding unit.
Thus, the decoding unit of the AMR-WB system is specialized for wideband speech. Therefore, even when decoded data to produce narrowband speech is input, there is a problem that an unnecessary higher-band signal produced by the higher-band section is added to a speech output from the speech decoding unit.
Various methods have heretofore been proposed as a method for improving efficiency of the coding/decoding corresponding to the low bit rate. For example, in Jpn. Pat. Appln. KOKAI Publication No. 2001-318698 (pages 2 to 4, FIG. 1), a technique is described in which a plurality of sets of positions of pulses expressing excitation signals are prepared, a set which minimizes a distortion with respect to the input speech signal is selected, and distinction information is transmitted to the reception side to thereby deal with the lowering of the bit rate.
Moreover, in Jpn. Pat. Appln. KOKAI Publication No. 11-259099 (pages 2, 5, 6, FIG. 1), a method is described in which a structure of a coding and decoding apparatus is switched by identification of speech/non-speech of the input signal. In this method, a structure in which a function block of a part of a coder or a decoder is optimized for processing the speech signal, and a structure optimized for processing a non-speech signal are disposed. Moreover, these structures are switched based on identification information of speech/non-speech.
However, in the technique described in the Jpn. Pat. Appln. KOKAI Publication No. 2001-318698, the distortion needs to be calculated with respect to each set of the possessed pulse positions. Therefore, there is a problem that the calculation amount required for selecting the set of pulse positions becomes enormous.
Moreover, in any of the above-described methods, a problem of mismatch between the speech coding system and the bandwidth of the input signal is not considered. Therefore, degradation of the speech quality caused in a case where the coded data of narrowband speech encoded at the low bit rate in the wideband signal as described above is decoded by the wideband speech decoding cannot be improved.