1. Field of the Invention
The present invention relates to a stereo audio encoding apparatus that is effective for encoding digital audio signal data for digital transmission or storage to a digital data storage medium, and also to a method therefor.
2. Description of the Prior Art
While many digital compression audio coding methods have existed for the last two decades, standardization efforts of digital compression source coding methods for wideband audio signals of 15 kHz or 20 kHz bandwidth have only taken place recently. The Near Instantaneous Companding Audio Multiplex (NICAM) has been adopted as a broadcast standard in the mid 1980s by various countries to produce sounds with quality comparable to FM stereo broadcast. In 1991, a subband coding (SBC) using feedforward quantization scheme, used in conjunction with psychoacoustic modelling, formed the core method of the audio coding standard to be adopted by the ISO/WG11/MPEG (Moving Picture Experts Group). The subband coding scheme would be the audio coding algorithm for coded representation of moving picture information and associated audio at a total data rate of 1.5Mbps (Megabits per second). The bit rates at which the audio coding algorithm must work ranges from 64 kbps (kilobits per second) to 192 kbps per single audio channel.
Description of the subband coding scheme using quadrature mirror filter for the subband filterbank and using psychoacoustics for the dynamic bit allocation can be found in United States Patent Application of Publication No. 4972484 dated Nov. 20, 1990. Detailed description of a similar subband coding method can be found in the document "Second Draft of Proposed Standard of Information Technology--Coding of Moving Pictures and Associated Audio for Digital Storage Media up to about 1.5 Mbps", Part 3: Audio Coding Standard ISO/IEC JTC1/SC2/WG11 N0043 MPEG 90/001, September 1990. In the latter document, the subband coding is implemented using a polyphase filterbank. In the stereo coding mode of this prior art, the subband encoder involves partitioning of the audio samples of each audio channel into 32 subbands via a polyphase filterbank, FFT analysis to determine psychoacoustic parameters, use of these parameters for adaptive bit allocation to subbands, mid-tread quantization of subband samples and transmission of essential side information. The essential side information includes bit allocation and scale factor data. This is illustrated in FIG. 5. At the decoder, the side information is used for the dequantization. Output samples are reconstructed after passing through an inverse filterbank.
In order to obtain better quality sounds at lower bit rates, it has been proposed in the ISO/MPEG audio algorithm an option of joint stereo coding. Joint stereo coding exploits the interchannel irrelevancy in a stereo pair of audio channels for bitrate reduction. The joint stereo coding used in ISO/MPEG is termed as intensity stereo coding. The purpose of this technique is to increase the sound quality of that obtain at a higher bit rate and/or reduce the bitrate for stereophonic signals. The intensity stereo technique makes use of psychoacoustical results which show that at frequencies above 2 kHz, the localization of the stereophonic image within a critical band is determined by temporal envelope and not by the temporal fine structure of the audio signal. This technique involves the transmission of the summed signals instead of the individual left and right signals for subbands that are to be in the stereo irrelevancy mode. Stereophonic image is preserved by transmitting the scale factors of both the channels. Quantization of the common summed samples, coding of these summed samples and coding of common bit allocation are performed in the same manner as in independent coding of each audio signal.
The intensity stereo scheme suggested in the MPEG document MPEG 90/011 recommends that the left and right subband samples be added. These added values, serving as common subband samples, are scaled in the normal way. The originally determined scale factors of the left and right channel subband signals are transmitted according to the bitstream syntax. Quantization of common subband samples, and coding of common bit allocation are performed in the same way as independent coding. For a very high positive correlation between two channels, this scheme will work. However, for channels with negative correlation, the reproduced sound quality would deteriorate tremendously.
An illustration is provided below using opposite phase left and right signals.
If the magnitude of the original or broadcasted left and right signals L and R in one frame are as follows: EQU L={10, 9, 8, 9, 6, -7, 5, -6, 8, 5} EQU R={-10, -9, -7, -7, -6, 8, -5, 6, -10, -5}
the maximums SF.sub.l and SF.sub.r of the absolute number in each frame of sampled signals can be expressed as follows: EQU SF.sub.l =10 EQU SF.sub.r =10
These values SF.sub.l and SF.sub.2 are referred to as left and right scale factors.
Power P.sub.1 in left channel is as follows: ##EQU1## wherein l.sub.i is a sampled data in signal L and n is the total number of sampled data (which is 10 in this example) Power P.sub.r in right channel is as follows: ##EQU2## wherein r.sub.i is a sampled data in signal R.
According to the prior art, the left and right sampled signals L and R are reproduced, using the left and right scale factors SF.sub.l and SF.sub.r, to signals L' and R' as explained as follows.
An average between the left and right channel signals can be given as follows: EQU {(l.sub.i +r.sub.i)/2}={0, 0, 0.5, 1, 0, 0.5, 0, 0, -1, 0}
Let SF.sub.m, which is the maximum absolute magnitude of the signal obtained from averaging between the left and right channel signals, be termed as the combined scale factor. In this example, SF.sub.m =1. The left and right signals are reproduced according to the following equations : EQU L'=SF.sub.l *{(L+R)/2}/SF.sub.m EQU R'=SF.sub.r *{(L+R)/2}/SF.sub.m
Thus, EQU L'={0, 0, 5, 10, 0, 5, 0, 0, -10, 0} EQU R'={0, 0, 5, 10, 0, 5, 0, 0, -10, 0}
are obtained and are used for audio signals supplied to left and right speakers.
Reconstructed powers P.sub.l ', and P.sub.l ', for left and right channels are as follows. ##EQU3##
When the signals L' and R' are used, about 50% of the power is reduced when the reconstruction system of the prior art is used.