The present invention relates to methods for coding and decoding signals of plural channels as a single unit or collectively and a coder and a decoder using such methods, respectively.
The signals of plural channels herein mentioned are such as audio signals of right and left channels, multichannel signals, combinations of acoustic and image signals, plural sequence signals obtained by distributing a single channel signal into a plurality of sequences at regular time intervals, or plural signal sequences obtained by splitting a single channel signal into a plurality of frequency bands; the present invention is applicable to any signals as long as they are signal sequences that may develop power imbalance therebetween.
A known typical method for high efficiency coding of an acoustic signal, such as a speech or musical signal, is a transform coding method according to which frequency domain coefficients (sample values at respective frequencies of the frequency characteristic of the acoustic signal), obtained by a time-to-frequency transformation (a Fourier transform) of the acoustic signal on a frame-by-frame basis, are normalized using the envelope (or spectral envelope) of the frequency characteristics of the acoustic signal and the resulting residual coefficients are vector quantized. Another typical coding method is a CELP (Coded-Excited Linear Prediction Coding) method according to which a speech signal is subjected to an LPC (Linear Predictive Coding) analysis in the time domain and prediction coefficients thus obtained are used as filter coefficients to synthesize speech from an excitation signal by a synthesis filter; the excitation signal is coded with a frequency component vector and a noise component vector so that the distortion of the synthesized speech is minimized.
In FIGS. 1A and 1B there are shown a coder 10 and a decoder 50 employing the conventional transform coding method. In the coder 10, a musical, speech or similar acoustic signal A.sub.T, fed as a digital signal sequence from an input terminal 11, is input into an MDCT (Modified Discrete Cosine Transform) part 23, wherein it is transformed frame by frame, for example, at time intervals of 16 to 64 msec or so in the case of a musical signal and about 10 to 20 msec in the case of a speech signal, into frequency domain coefficients A.sub.F. At the same time, the acoustic signal A.sub.T from the input terminal 11 is fed into a spectral envelope calculating part 24, wherein the spectral envelope of the input acoustic signal A.sub.T is calculated, then the envelope is quantized in a quantization part 25, from which an envelope index I.sub.E is provided, and in a normalization part 26 the frequency domain coefficients A.sub.F from the MDCT part 23 are divided by the quantized envelope Eq from the quantization part 25 into less fluctuant residual coefficients X. In a scalar quantization part 27 the residual coefficients X are scalar quantized; in this case, bits are allocated to respective frequency bands in accordance with the frequency characteristic of the input acoustic signal A.sub.T. This bit allocation takes place in a bit allocation calculating part 28. An allocation index I.sub.B is provided as allocation information B from a coding part 29, and the residual coefficients X are scalar quantized in accordance with the bit allocation in the scalar quantization part 27, from which quantized residual coefficients Xq are provided.
In the decoder 50, as depicted in FIG. 1B, the indexes I.sub.E and I.sub.B input thereinto are decoded in decoding parts 62 and 63 into the spectral envelope Eq and the bit allocation information B, respectively. In a decoding part 64 the quantized residual coefficients Xq are decoded into the residual coefficients X' on the basis of the bit allocation information B. The decoded envelope Eq is provided to a de-normalization part 65, wherein it is de-normalized by being multiplied by the residual coefficients X', whereby the frequency domain coefficients are restored. The frequency domain coefficients, identified by A.sub.F ', are provided to an IMDCT (Inverse Modified Discrete Cosine Transformation) part 66, wherein they are restored into an acoustic signal A.sub.T ' in the time domain by an inverse modified discrete cosine transformation; the acoustic signal A.sub.T ' is fed to an output terminal 51.
FIG. 2A shows the configuration of a speech signal coder utilizing the CELP method which is basically equivalent to the one disclosed, for instance, in U.S. Pat. No. 5,195,137. A speech signal fed to the input terminal 11 is subjected to a linear predictive coding analysis in an LPC analysis part 12 for each frame of a fixed length to obtain linear prediction coefficients .alpha., which are provided as filter coefficients to an LPC synthesis filter 13. In an adaptive codebook 14 is held an excitation vector E determined in the previous frame and provided to the synthesis filter 13. A segment of a length S is cut out of the excitation vector; such a segment is repeatedly connected until a frame length T is reached, by which an adaptive code vector (referred to also as pitch component vector) corresponding to a speech period component is generated. By changing the cutout length S, an adaptive code vector corresponding to a different pitch component can also be obtained. In a random codebook 16 there are recorded a plurality of random code vectors each of the frame length; when an index C is specified, the corresponding random code vector is read out from the random codebook 16. The adaptive code vector and the random code vector read out of the adaptive codebook 14 and the random codebook 16 are provided to multipliers 15 and 17, wherein they are multiplied by weighting factors (gains) g.sub.0 and g.sub.1 fed from distortion calculating/codebook search part 21. The multiplied outputs are added by an adder 18 and the adder output is applied as an excitation vector E to the synthesis filter 13 to synthesize a speech signal.
At first, the weighting factor g.sub.1 is set to zero and a segment cutout length S is selected. The difference between the synthesized speech signal (vector) from the synthesis filter 13 excited by the adaptive code vector corresponding to the selected cutout length S and the input speech signal (vector) is calculated by a subtractor 19. The error vector thus obtained is provided to the distortion calculating/codebook search part 21 after being assigned a psycho-acoustic weight, as required, in a psycho-acoustic weighting part 20. In the distortion calculating/codebook search part 21, the sum of the squares of elements of the weighted error vector (an intersymbol distance) is calculated and stored as distortion of the synthesized speech signal. By changing the cutout length S over a predetermined range of values while repeating the foregoing processings, the distortion calculating/codebook search part 21 determines the cutout length S of a particular value that minimizes the synthesized speech distortion. The excitation vector E, which is generated by such a manipulation, is fed to the synthesis filter 13 to synthesize a sound, which in turn is removed by the subtractor 19 from the input signal A.sub.T to obtain a noise component. Then, the random code vector that would minimize the distortion of the synthesized noise is selected from the random codebook 16, using the noise component as a target value of the synthesized noise when using the random code vector from the random codebook 16 as the excitation vector E; and the index C is obtained which corresponds to the thus selected random code vector. The thus determined random code vector is used to calculate the weighting factor g.sub.1 which would minimize the synthesized speech distortion. The weighting factors g.sub.0 and g.sub.1 determined in this way are coded as a weighting code G=(g.sub.0, g.sub.1) in a coding part 22. The linear prediction coefficients .alpha., the cutout length S, the random code vector index C and the weighting code G, thus determined for each frame of the input speech signal, are outputted from the coder of FIG. 2A as codes corresponding to the input speech.
In a decoder, as shown in FIG. 2B, the linear prediction coefficients .alpha. fed thereto are set as filter coefficients in an LPC synthesis filter 52. On the basis of the cutout length S and the index C, an adaptive code vector and a random patter vector are read out from an adaptive codebook 54 and a random codebook 56 in the same fashion as in the coder of FIG. 2A; these vectors are provided to multipliers 55 and 57, wherein they are multiplied by the weighting factors g.sub.0 and g.sub.1 from a weight recovery or decoding part 53. The multiplied outputs are added by an adder 58; the adder output is applied as an excitation vector to the LPC synthesis filter 52, from which synthesized speech is provided to the output terminal 51.
The coder of FIG. 2A has been described above to produce the adaptive code vectors by repeatedly connecting a segment cut out from the excitation vector of the immediately previous frame stored in the adaptive codebook; however, as disclosed, for example, in M. R. Schroeder and B. S. Atal, "CODE-EXCITED LINEAR PREDICTION (CELP): HIGH-FREQUENCY SPEECH AT VERY LOW BIT RATES," IEEE ICASSP '85, pp. 937-940, the CELP scheme may vector quantize the excitation signal in such a manner as to minimize the synthesized speech distortion through use of a codebook having a number of predetermined waveform vectors as excitation vectors. Hence, it is not always necessary to use a codebook which adaptively vary as described above in respect of FIG. 2A. According to another CELP scheme, the prediction coefficients may be obtained by an LPC analysis of previous synthesized speech instead of calculating them by the LPC analysis of the input speech signal A.sub.T as in FIG. 2A, as disclosed in Juin-Hwey Chen, "HIGH-QUALITY 16 KB/S SPEECH CODING WITH A ONE-WAY DELAY LESS THAN 2 MS," IEEE ICASSP '90, p. 543, for instance. This scheme avoids the necessity of coding and providing the prediction coefficients to the decoding side.
For example, in the case of right and left two-channel audio signals, a quantization error for the respective signal level is 1/2.sup.5 when the signals of the right and left channels are each fixedly coded into 5-bit information. However, even when the signal power is extremely unbalanced between the right and left channels, the quantization error (distortion) could be reduced down to 1/2.sup.8 without changing the total amount of information for signal coding, by using the same resolution in common to the right and left channel levels and allocating, for example, eight bits to the channel signal of the larger power and two bits to the channel signal of the smaller power.
In the case of coding stereo signals of right and left channels with a predetermined amount of information by using two sets of such coders of FIG. 2A, it is impossible, by merely coding the signal of each channel with just one half the total amount of information, to reduce the distortion through effective utilization of properties of such stereo signals which sometimes have severe power imbalance between the channels.
A known method of implementing optimum coding to suit power imbalance between two channels is to adaptively allocate bits on the basis of the index read 10 out of the codebook. With this technique, the number of kinds of possible bit allocation is large--this leads to the necessity of using a codebook of a size corresponding to the number of indexes representing the bit allocations. In practice, however, since the codebook size and the amount of processing involved increase in proportion to the bit-number power of 2, the allocation of many bits is impractical. Furthermore, a code error in the gain information would produce ambiguities in the boundaries of indexes, causing severe errors in the reconstruction of all vectors.
Also in the case of using two sets of coders of FIG. 1A to code right and left two-channel stereo signals which have sometimes severe power imbalance therebetween, the reduction of the quantization distortion by coding them with the same amount of information for both channels could be achieved by allocating bits to the right and left channels in the quantization parts 25 and the bit allocation calculating parts 28 of both channels in accordance with the power imbalance between the channels. Since this scalar quantization method requires the generation of a bit allocation code (the index I.sub.B) which is closely correlated with the spectral envelope, the efficiency of the coder will be impaired by a detailed bit assignment to many narrow subbands divided from the frequency band. On the other hand, when the frequency band is divided into relatively large subbands, it is impossible to sufficiently respond to imbalance of the frequency characteristic of the input signal; hence, the quantization distortion increases and the efficiency of utilization of input signal redundancy decreases accordingly. When a code error occurs in the bit allocation index I.sub.B, the partitioning of a bit train into the quantized residual coefficients Xq becomes confused, resulting in the coded residual coefficients X' being highly distorted on the decoding side. As is the case with the CELP schemes, an error in the decoding of the bit allocation code will seriously distort reconstructed speech.
The above description has been given of the acoustic signals of two channels; also in the multiplex transmission of speech and image signals, it is customary to code each of them with a fixed amount of information. Also in such an instance, when imbalance of information occurs between speech and information, it is desirable to effectively utilize its property. For example, speech has silent durations at very short intervals; substantially no information needs to be transmitted in silent durations. Also in the case where an image undergoes an interframe prediction to compress information, the amount of information to be sent is very small when the image does not move. Where a combined amount of speech and image information is fixed, the overall distortion can be reduced by an adaptive bit allocation between both information. As is the case with the above-described stereo signals, however, there are serious problems in the vector quantization processing and the robustness against code errors.