The present invention relates to a method for encoding speech at a low bit rate and, more particularly, to a method for encoding speech and a method for decoding speech wherein a speech signal including a background noise is encoded by compressing it efficiently in a state which is as close to the original speech as possible.
Further, the present invention relates to a method for encoding speech wherein a speech signal is compressed and encoded, and, more particularly, to speech encoding used for digital telephones and the like and a method for encoding speech for speech synthesis used for text read-out software and the like.
Conventional low-bit-rate speech coding is directed to efficient coding of a speech signal and is carried out according to speech coding methods which employ a model of a speech production process. Among such methods for speech coding, methods based on a CELP system have recently been spreading remarkably. When such a method for encoding speech on a CELP basis is used, a speech signal input in an environment having little background noise can be encoded efficiently because the signal matches the model for encoding, and this allows encoding with deterioration of speech quality at a relatively low level.
However, it is known that when a method for encoding speech on a CELP basis is used for a speech signal input under a condition where a background noise is at a high level, the background noise included in a reproduced output signal comes out very differently to produce speech which is very unstable and uncomfortable. Such a tendency is significant especially at an encoding bit rate of 8 kbps or less.
In order to mitigate this problem, a method has been proposed wherein the CELP encoding is performed using a more noisy excitation signal for a time window which has been determined to be a background noise to mitigate deterioration of speech quality in such a window of a background noise. Although such a method provides some improvement of speech quality in the window for a background noise, the improvement is problematically insufficient in that the tendency of producing a noise that sounds differently from the background noise in the original speech still remains because a model of a speech production process is used in which speech is synthesized by having the excitation signal passed through a synthesis filter.
As described above, the conventional method for encoding speech has a problem in that when a speech signal input under a condition where a background noise is at a high level is encoded, the background noise included in a reproduced output signal comes out very differently to produce speech which is very unstable and uncomfortable.