1. Field
One or more embodiments relate to a method and apparatus for encoding/decoding a speech signal, and more particularly, to a method and apparatus for improving a sound quality of a speech signal by encoding and decoding the speech signal based on a variable bit rate.
2. Description of the Related Art
Speech transmission using digital technologies is widespread and such a trend is more noticeable in long distance and digital wireless telephone applications. Consequently, there have been increased interests in determining the minimum amount of information that would need to be transmitted via a channel while maintaining sufficient quality for speech restoration. When speech is transmitted using simple sampling and digitizing, a data transmission rate of 64 kbps is required for speech quality matching that of a conventional analog telephone. However, even with adequate coding and a speech analysis after restoration in a transmission unit and a receiving unit, there may be significant reduction in a data transmission rate.
Accordingly, there have been attempts to overcome these drawbacks by the use of speech coders that utilize speech compression techniques based on extracting parameters related to a modeling of human speech generation, i.e., rather than a straight sampling and digitalizing of a speech signal. Such speech coders divide input speech signals into time blocks or analytic frames. In general, speech coders include an encoder and a decoder. The encoder analyzes input speech frames by extracting such specific related parameters, and performs quantization so that the input speech frames may be expressed in binary such as sets of bits or binary packets, for example. The data packets are transmitted to receiving units or decoders using the communication channel. The decoder processes the data packets, and performs a quantization for the data packets to generate the parameters, and restores speech frames using the generated parameters.
One such speech coder is the Code Excited Linear Predictive (CELP) coder, cited as a reference in L. B. Rabiner & R. W. Schafer “Digital processing of the speech signals 396-453 (1978)”. In the CELP coder, short term relations or redundancies in the speech signals are removed by linear predictive (LP) analysis which looks for the short term Formant filter coefficients. By applying the short term predictive filters to input speech frames, LP remaining signals are generated, and these signals are further modeled, and quantized into statistic codebooks in which they are with the long term predictive filter parameters.
Consequently, CELP coding separates an encoding task for a speech waveform of a time domain into an encoding of the short term filter coefficient and an encoding of the LP remaining signals.
CELP coding may be performed at a fixed rate (for example, identical bits per frame). However, it may not be efficient as identical bits are allocated in both cases of when a larger number of bits would be required due to existence of speech signals, compared to when a smaller number of bits would be required due to non-existence of speech signals such as with silence.
Also, CELP coding may be operated at variable rates (different frame rates applied to different types of frame contents). A variable bit rate coder performs encoding of bits required at a level adequate for codec parameters to achieve a target quality. However, the coding methods based on the variable bit rates which are presently used only select a bit rate appropriate for circumstances from among several bit rates, and thus there is a limit in applicable bit rates.