1. Field of the Invention
The present invention relates generally to an adaptive transform coding and/or decoding system. More specifically, the invention relates to a system for efficiently coding and decoding speech and audio signals with maintaining high quality.
2. Description of the Related Art
Conventionally, as an adaptive transform coding system and an adaptive transform decoding system for efficiently coding and decoding a speech signal and an audio signal with maintaining high quality, there are MPEG (Moving Pictures Expert Group)/Audio Layers 3 or so forth. The technology of MPEG/Audio Layer 3 has been discussed in 1993 ISO/IEC 11172-3, xe2x80x9cCoding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mb/sxe2x80x9d (hereinafter simply referred to as reference No. 1).
FIG. 3 is a block diagram showing one example of the conventional adaptive transform coding system. The conventional adaptive transform coding system is constructed with an input terminal 1, a transform means 2, an analysis means 3, a quantizing parameter determining means 4, a quantizing means 5, a coding means 7, a parameter coding means 9, an adder 22, a multiplexer 23 and an output terminal 12.
In the input terminal 1, digitized audio signal samples are inputted. The input audio samples are outputted to the transform means 2 and an analysis means 3.
In the transform means 2, at every input of N time-domain audio samples, N frequency-domain-samples are generated from the input audio samples by a hybrid analysis filter bank. N frequency-domain-samples grouped in ascending order are referred to as xe2x80x9cframexe2x80x9d. The derived frequency-domain-samples are outputted to the quantizing means 5 and the analysis means 3. N is a positive integer, and in case of MPEG/Audio Layer 3, N is 576. The hybrid analysis filter bank has been discussed in detail in the foregoing reference 1.
In the analysis means 3, an allowable quantization error for each frequency-domain-sample in the frame is derived and outputted to the quantization parameter determining means 4. In coding of the audio signal, a subjective quality is important. Therefore, allowable quantization error is determined so that the degradation of the frequency domain signals is not easily perceptible by human acoustic sense. The manner of determining the allowable quantization error has also been discussed in detail in the reference 1. For example, there is a method to analyze a frequency spectrum obtained through Fourier transform of the input audio samples.
In the quantizing means 5, the frequency domain signal X is quantized on the basis of a quantization step size QS derived from the quantization parameter determining means 4, Then, the quantized value Y is derived from rounding the (xc2xe)th power of quantized frequency domain signal. Namely, the quantized value Y is expressed by:
Y=nint(pow(X/S, xc2xe))
Wherein nint ( ) represents rounding process for rounding the fraction off after the decimal point, and pow (a, b) represents a to the (b)th power. The quantized values in each frame are grouped in ascending order in the frequency to be fed to the coding means 7. On the order hand, the quantizing means 5 calculates a quantization error YZ to output to the quantization parameter determining means 4. An inverse-quantized value YY of the quantized value Y is expressed by:
YY=pow(Y, {fraction (4/3)})
Therefore, the quantization error YZ is expressed as:
YZ=Xxe2x88x92pow(Y, {fraction (4/3)})
In the coding means 7, as set out in detail later, each quantized value in the frame is encoded. Then, a code C1 and a code amount L1 of the code C1 are derived. The code C1 is outputted to the multiplexer 23, and the code amount L1 is outputted to the adder 22.
In the parameter coding means 9, the quantization step size QS inputted from the quantization parameter determining means 4 is encoded. Then, a code C2 and a code amount L2 of the code C2 are derived. The code C2 is inputted to the multiplexer 23 and the code amount L2 is inputted to the adder 22.
In the adder 22, the total code amount outputted from the coding means 7 and the parameter coding means 9, namely the sum of L1 and L2, is derived, and outputted to the quantization parameter determining means 4 as a total code amount.
The total code amount outputted from the adder 22 is variable depending upon the size of the quantization step size QS. Generally, when the quantization step size QS becomes smaller, the total code amount becomes larger and when the quantization step size QS becomes larger, the total code amount becomes smaller. In the quantization parameter determining means 4, the quantization step size Q is controlled so that the total code amount can be maintained to be less than or equal to the allowable code amount which is determined on the basis of the coding bit rate, and that the quantization error is proportional to the allowable quantization error. For an example of this control, at first, the quantization step size QS is set at sufficiently small value, and the coding means 7 and the parameter coding means 9 are operated to derive the total code amount. Then, the following two operations are repeated until the total code amount becomes equal or less than the allowable code amount. As the first operation, the quantization step size QS is set at a greater value in proportion to the allowable quantization error. Then, the coding means 7 and the parameter coding means 9 are operated to derive the total code amount.
In the multiplexer 23, the codes C1 and C2 are multiplexed to generate a bit stream.
The bit stream is outputted from the output terminal 12.
In the coding means 7, the quantized values of the frame are divided into three regions on the frequency axis, i.e. a type 1 region, a type 2 region, and a type 3 region. Each quantized values in the type 1 region and the type 2 region are Huffman-encoded.
At first, a method for dividing the quantized values in the frame into three regions will be discussed. The N quantized-values are grouped in ascending order of the frequency and compose the vector X as follows:
Vector X=[x(1), x(2), . . . , x(N)]
Each element x(1), x(2), . . . , x(N) of the vector X represents respective quantized value. The type 1 region includes the quantized values of the low frequency signal, and includes x(1), x(2), . . . , X(2xc3x97bigvalues) of (2xc3x97bigvalues) elements. The type 2 region includes the quantized values whose absolute values are 0 or 1 and includes x(2xc3x97bigvalues+1), x(2xc3x97bigvalues+2), . . . , x(2xc3x97bitvalues+4xc3x97count 1) of (4xc3x97count1) elements. The type 3 region includes elements whose values are zero, and includes x(2xc3x97bigvalues+4xc3x97count1+1), x(2xc3x97bigvalues+4xc3x97count1+2), . . . , x(N) of (2xc3x97rzero) elements. Here,
2xc3x97big_values+4xc3x97count1xc3x972xc3x97rzero=N.
The value rzero is calculated by
rzero=(Nxe2x88x92t(t mod 2))/2
where t is the maximum value satisfying
x(t)xe2x89xa00, (t=1, 2, . . . , N)
(x1 mod x2) represents the remainder in division of x1 by x2.
The value count1 is calculated by
count1=(Nxe2x88x92rzeroxc3x972xe2x88x92t2xe2x88x92((Nxe2x88x92rzeroxc3x972xe2x88x92t2) mod 4/4
where t2 is the maximum value satisfying |x(t2)| greater than 1.
The value bigvalues is derived from
big_values=(Nxe2x88x92rzeroxc3x972xe2x88x92count1xc3x974)/2
Each element included in the type 1 and type 2 regions is Huffman-coded employing a table selected among prepared Huffman tables for respective regions. An appropriate Huffman table is selected so that the total amount of the Huffman code becomes minimum.
Huffman tables prepared for coding respective elements in the type 1 region are different in terms of the assumed appearance frequency of respective element-values and the region of the quantized values to be coded. The region of the quantized values to be coded by the Huffman table selected upon coding of each element in the type 1 region becomes larger depending upon the maximum absolute value of respective elements included in the type 1 region. At the same time, each code in the Huffman table generally becomes longer. On the other hand, since the type 2 region includes only elements having absolute values 0 or 1, the average code amount per one element upon coding in the type 2 region becomes smaller than that of the type 1 region.
The bigvalues, rzero and information relating to the Huffman tables to be used in the type 1 region and the type 2 region are coded as side information. The Huffman code and the side information are multiplexed and outputted as the code C1.
FIG. 4 is a block diagram showing one example of the adaptive transform decoding system. The conventional adaptive transform decoding system includes an input terminal 13, a demultiplexer 24, a decoding means 15, a parameter decoding means, an inverse quantizing means 19, an inverse transform means 20 and the output terminal 21.
To the input terminal 13, the bit stream is inputted. The bit stream is then outputted to the demultiplexer 24.
In the demultiplexer 24, the bit stream is separated into the code C1 and the code C2. The code C1 is outputted to the decoding means 15 and the code C2 is outputted to the parameter decoding means 17. In the parameter decoding means 17, the quantization step size is derived by decoding the code C2. The derived quantization step size is outputted to the inverse quantizing means 19.
In the decoding means 15, at first, the code C1 is separated into the Huffman codes and the side information. Next, the quantized values of the type 1 region and the type 2 region are derived by decoding the Huffman codes using the Huffman table indicated by the side information. The quantized values thus obtained are fed to the inverse quantizing means 19.
In the inverse quantizing means 19, an inverse quantized value is derived by the inverse quantization of the quantized value. The inverse quantized value YY is derived from the quantized value Y through the following equation:
YY=pow(Y, {fraction (4/3)})
The inverse quantized values thus derived are outputted to the inverse transform means 20.
The inverse transform means 20 derives a time domain signal from the inverse quantized values through a hybrid synthesis filter bank. The hybrid synthesis filter bank has been discussed in detail in the foregoing reference 1.
Then, the time domain signal is outputted from the output terminal 21.
A first problem encountered in the foregoing adaptive transform coding and decoding systems is low coding efficiency upon coding the element in the vicinity of the boundary to the type 2 region in the type 1 region.
Most elements of the type 1 region in the vicinity of the boundary to the type 2 region have absolute value of 0 or 1 similar to the elements in the type 2 region. These elements may be encoded by using the Huffman code table for the type 2 region. However, because of the presence of a small number of elements having absolute value of 2 or more, in the vicinity of the boundary to the type 2 region, the elements having absolute value 0 or 1 in the vicinity of the boundary to the type 2 region of the type 1 region should be coded as elements in the type 1 region. Since the average code amount for one element in the type 1 region is larger than that in the type 2 region, when a small number of elements having absolute value of 2 or more are included in the type 1 region in the vicinity of the boundary to the type 2 region, the coding efficiency is degraded.
The second problem to be encountered is that when the type 1 region includes a small number of elements having a large absolute value, the coding efficiency is degraded.
The size of the Huffman table to be selected upon coding the elements in the type 1 region becomes larger depending upon the maximum absolute value of the element included in the type 1 region. At the same time, each code length in the Huffman table becomes longer. When the type 1 region includes a small number of elements having large absolute value, the average code amount for one element becomes large and the coding efficiency is degraded.
It is therefore an object of the present invention to provide an adaptive transform coding system, an adaptive transform decoding system and an adaptive transform coding and decoding system, which can improve the coding efficiency by performing a special process for the elements having a large absolute value.
According to the first aspect of the invention, an adaptive transform coding system comprises:
a transform means for transforming a set of input signal samples into a frequency domain;
an analysis means for analyzing the input signal and the frequency domain signal to derive an allowable quantization error;
a quantizing means for quantizing the amplitude value of the frequency domain signal on the basis of a quantization step size to derive a quantized value and a quantization error,
a quantization parameter determining means for determining the quantization step size with reference to the allowable quantization error and the quantization error and a total code amount;
a selector for analyzing the quantized value of the frequency domain signal to derive a first signal and a second signal;
a first coding means for coding the quantized value of the first signal with reference to the second signal to derive a first code and a first code amount;
a second coding means for coding the quantized value of the second signal to derive a second code and a second code amount;
a parameter coding means for coding the quantization step size to derive a third code and a third code amount;
an adder for deriving the total code amount of the first code amount, the second code amount and the third code amount; and
a multiplexer for multiplexing the first code, the second code and the third code to generate a bit stream.
In the construction set forth above, the small number of quantized values having large absolute value and the other quantized values are coded by different means. Therefore, in the coding means for coding the quantized values other than those having the large absolute values, a Huffman code table can be smaller than that in the prior art to reduce the average code amount for one quantized value and thus the improvement of the coding efficiency can be achieved.
The second coding means may divide the quantized values of the frequency domain signal into a first signal and a third signal to generate a fourth signal, in which the absolute value of the quantized value of the first signal is replaced with smaller quantized value, and the second signal may be generated by combining the third signal and the fourth signal. Also, the selector may derive the first signal and the second signal so that the total code amount becomes minimum. The first coding means may generate the first code by coding the absolute value of the quantized value of the first signal, the polarity of the quantized value of the first signal and the frequency of the first signal. In this case, the first coding means may derive a threshold for the quantized value of the first signal to code a value derived by subtracting the threshold from the quantized value of the first signal in place of the absolute value of the quantized value of the first signal. In each sample of the first signal, the threshold value may be a value derived by adding one for the absolute value of the quantized value of a sample of the second signal at the same frequency to the sample of the first signal. Also, a region of quantized values to be coded in the second coding means may be limited, and for each sample of the first signal, the threshold may be a value derived by adding one to a maximum absolute value of an input region of the second coding means upon coding the signal having the same frequency as that of the sample by the second coding means.
In the alternative, the first coding means may code the frequency of each sample of the first signal in the ascending order of the frequency, and for the sample other than the sample having the lowest frequency, the difference of the frequency between a sample and its adjacent predecessor is coded. The frequency signal may be divided into a plurality of regions, and in the first coding means, in place of the frequency of the sample having the lowest frequency, the number of boundaries lower than the frequency of the sample having the lowest frequency, and the difference between the maximum region boundary frequency lower than the frequency of the sample having the lowest frequency and the said lowest frequency, are coded.
According to the second aspect of the invention, an adaptive transform decoding system comprising:
a demultiplexer for separating an input signal into a first code, a second code and a third code;
a first decoding means for decoding the first code with reference to the second code to derive a first signal;
a second decoding means for decoding the second code to derive a second signal;
a parameter decoding means for decoding the third signal to derive a quantization step size;
a synthesis means for synthesizing the first signal and the second signal for deriving a synthesized signal;
an inverse quantizing means for inverse quantizing the quantized value of the synthesized signal to derive an inverse quantized signal; and
an inverse transform means for transforming the inverse quantized signal into a time domain signal.
The first decoding means may derive a frequency of the quantized value, an absolute value of the quantized value and the polarity of the quantized value by decoding the first code to set a frequency of the quantized value, an absolute value of the quantized value and the polarity of the quantized value of the first signal, respectively. The first decoding means may derive a threshold and take a value derived by adding the threshold to the absolute value of the quantized value derived by decoding the first code as an absolute value of the quantized value of the first signal, in place of the absolute value of the quantized value derived by decoding the first code. In each sample of the first signal, the threshold may be obtained by quantizing the second signal at the same frequency and taking its absolute value. The second decoding means may have a restriction in an inverse quantized value, and in each sample of the first signal, the threshold may be a value derived by adding one to the maximum absolute value of the restriction when the second decoding means decodes the signal having the same frequency as the sample.
The first decoding means may derive a difference of the frequency and the frequency of the sample of the lowest frequency, and derives the frequency of the sample other than the sample having the lowest frequency by adding the difference of the frequency to the frequency of its adjacent predecessor. In this case, the frequency domain signal is divided into a plurality of region. In the first decoding means, the number of region boundaries and the difference of the frequencies may be derived by decoding, and a value derived by adding a difference of the frequencies to a frequency of the region boundary indicated by the number of the region boundary is taken as the frequency of the sample having the lowest frequency.
The synthesis means may generate a signal replacing the quantized value of the sample having the same frequency as the frequency of each sample of the first signal with the quantized value of the first signal to take the replaced signal as the synthesized signal.
According to the third aspect of the invention, an adaptive transform coding and decoding system comprises:
a transform means for transforming an input signal into a frequency domain signal;
an analysis means for analyzing the input signal and the frequency domain signal to derive an allowable quantization error;
a quantization means for quantizing the amplitude value of the frequency domain signal on the basis of a quantization step size to derive a quantized value and a quantization error,
a quantization parameter determining means for determining the quantization step size with reference to the allowable quantization error and the quantization error and a total code amount;
a selector for analyzing the quantized value of the frequency domain signal to derive a first signal and a second signal;
a first coding means for coding the quantized value of the first signal with reference to the second signal to derive a first code and a first code amount;
a second coding means for coding the quantized value of the second signal to derive a second code and a second code amount;
a parameter coding means for coding the quantization step size to derive a third code and a third code amount;
an adder portion for deriving the total code amount of the first code amount, the second code amount and the third code amount;
a multiplexer for multiplexing the first code, the second code and the third code to generate a bit stream
a demultiplexer for separating an input signal into a first code, a second code and a third code;
a first decoding means for decoding the first code with reference to the second code to derive a first signal;
a second decoding means for decoding the second code to derive a second signal;
a parameter decoding means for decoding the third signal to derive a quantization step size;
a synthesis means for synthesizing the first signal and the second signal for deriving a synthesized signal;
an inverse quantizing means for inverse quantizing the quantized value of the synthesized signal to derive an inverse quantized signal; and
an inverse transform means for transforming the inverse quantized signal into a time domain signal.