Technologies for digitally transmitting and storing speech and audio are widely used in wireless communication and a voice over IP (VoIP) service, as well as in wired communication including a conventional telephone network. If speech and audio signals are transmitted after being simply sampled and digitalized, a data rate of, for example, 64 kbps (when they are sampled at 8 kHz and each sample is encoded with 8 bits) is required. However, the speech can be transmitted in a lower data rate if a signal analysis technique and a proper coding technique are used. A waveform coding, a code-excited linear prediction (CELP) coding, and a transform coding method are widely used for speech and audio compression. The waveform coding scheme is very simple and encodes amplitude of each sample itself or a difference between each sample and a previous sample in a predetermined number of bits, but a higher bit rate is required. The CELP coding scheme is based on a speech production model, and models the speech with a linear prediction filter and an excitation signal. It can compress the speech in a relatively lower rate, but its performance on the audio signal is deteriorated. The transform coding scheme transforms time domain speech signals into frequency domain signals, and then encodes transformed coefficients corresponding to each frequency component. Typically, it can encode each frequency component using the auditory characteristics of humans.
A speech codec for the communication has evolved from narrowband coding of a conventional telephone bandwidth to wideband or super wideband coding capable of providing a better naturalness and clarity. A multi-rate codec supporting to multiple bit rates in a single codec is widely used to accommodate a variety of network environments. Furthermore, an embedded variable bit rate codec has been developed to provide bandwidth scalability for adopting signals with various bandwidths and bit-rate scalability in embedded manner. The embedded variable bit rate codec is configured such that a bit stream of a higher bit rate contains a bit stream of a lower bit rate. It usually adopts a hierarchical coding scheme. As the signal bandwidth increases, a quality of codec for audio signal such as music is also considered as an important factor. Accordingly, a hybrid coding scheme, where overall signal bandwidth is divided into two subband signals such that the waveform coding scheme or the CELP coding scheme are applied to lower band signal and the transform coding scheme is applied to higher band signal, is used. As such, the transform coding scheme is widely used in a speech codec for communication that supports the wideband or super wideband, as well as the conventional audio codec.
In the transform coding scheme, time domain signal is required to be transformed into frequency domain signal. In most of cases, the Modified Discrete Cosine Transform (MDCT) is used. The quality of transform codec suffers from quantization errors of the MDCT coefficients caused by the limited bit rate of the codec. In order to solve this problem, a method for reducing the MDCT quantization error by adding an enhancement layer with a relatively low bit rate can be used.
In this case, since the number of bits that are dynamically allocated to the MDCT coefficient depends only on an absolute value of the quantized MDCT coefficient, the overall quantization performance of the core layer and the enhancement layer is determined by the MDCT quantization performance of the core layer. However, when a large quantization error occurs in a certain MDCT coefficient and the magnitude of the quantized MDCT coefficient is less than the magnitudes of other coefficients, fewer bits are allocated to the MDCT coefficient such that the large quantization error cannot be effectively compensated.