1. Field of the Invention
The present invention relates to an encoding apparatus, method and program for simultaneously recording or reproducing multi-channel audio data or the like from many channels by a so-called high-efficiency coding.
This application claims the priority of the Japanese Patent Application No. 2003-105642 filed on Apr. 9, 2003, the entirety of which is incorporated by reference herein.
2. Description of the Related Art
Heretofore, there have been proposed various techniques of high-efficiency coding of audio or sound signals. They include the so-called subband coding (SBC) in which a time-axial audio signal or the like is encoded by dividing the frequency band of the signal into a plurality of frequency bands without blocking or framing the signal, the so-called transform coding in which a time-axial signal is blocked or framed in units of a predetermined time, the time-axial signal is transformed frame by frame into a signal frequency-axial one (by spectrum transform) and divided into a plurality of frequency bands, and encoded band by band, etc., for example. Also, there has been proposed a high-efficiency coding technique which is a combination of the subband coding and transform coding techniques. In this case, the frequency band of a signal is divided into subbands by the subband coding technique, then the signal in each subband is orthogonal- or spectrum-transformed band by band into a frequency-axial signal, and the spectrum-transformed signal is encoded band by band.
Note that the aforementioned subband coding (SBC) uses a subband filter such as quadrature mirror filter (QMF) or the like. The QMF is referred to in “Digital Coding of Speech in Subbands” (R. E. Crochiere, Bell Syst. Tech. J, Vol. 55, No. 8, 1976). Also, an iso-bandwith filtering technique is disclosed in “Polyphase Quadrature Filters—A New Subband Coding Technique” (Joseph. H. Rothweiler, ICASSP 83, Boston). Further, the aforementioned orthogonal or spectrum transform is such that an input audio signal is blocked or framed in units of a predetermined time and a time-axial signal is transformed block by block or frame by frame into a frequency-axial one by the discrete Fourier Transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT) or the like. The MDCT is disclosed in “Subband/Transform Coding Using Filter Bank Design Based on Time Domain Aliasing Cancellation”—J. P. Princen, A. B. Bradley, Univ. of Surrey Royal Melbourne Inst. of Tech. ICASSP, 1987.
Note here that in many cases of quantizing each of frequency subband components, the frequency is divided into a bandwidth determined with the human hearing characteristics being taken in consideration. That is, an audio signal is divided into a plurality of bands (25 bands, for example) called “critical band” whose width is normally larger when the frequency is higher. Also, in encoding the data band by band, a predetermined bit allocation or adaptive bit allocation is done band by band in some cases. For example, in encoding coefficient data resulted from the aforementioned MDCT by the bit allocation, MDCT coefficient data in each band, resulted from the MDCT made frame by frame, will be encoded with an adaptive number of allocated bits. The bit allocation is done by the following two well-known techniques.
One of the two such techniques is referred to in “Adaptive Transform Coding of Speech Signals”—IEEE Transactions of Acoustics, Speech and Signal Processing, Vol. ASSP-25, No. 4, August 1977, in which the bit allocation is done based on the size of a signal in each band. With this technique, the quantization noise spectrum is flat and noise energy is minimum, but the actual noise feeling is not acoustically optimum because no masking effect is utilized. Also, a technique of fixed bit allocation in which the acoustic masking is utilized to provide a necessary signal-to-noise ratio for each band is disclosed in “The Critical Band Coder—Digital Encoding of the Perceptual Requirements of the Auditory System”—M. A. Kranser, MIT, ICASSP, 1980.
Also, the orthogonal transform is often used for encoding of video information as well as of audio information. A typical one of such techniques used for such coding is the discrete cosine transform (DCT). For example, DCT is effected of each block having a size of 8×8 pixels to provide a coefficient of transform and a bit allocation is done with priority given to the low-frequency band. For a higher coding efficiency, an entropy code is frequently used. A larger prediction gain can be assured using an inter-frame prediction signal as input information for the DCT transform in consideration. Also, the inter-frame prediction gain can further be raised using a motion compensation.
There is proposed in the Japanese Patent Application Laid-Open No. H08-123488 a high-efficiency coding technique using a bit allocation that can further raise the efficiency of simultaneous write and read of multimedia information or multiple-content information including video and audio information and determine a time for possible recording before encoding the information.
The high-efficiency coding technique disclosed in the Japanese Patent Application Laid-Open No. H08-123488 is used in a system with video and audio channels to provide a bit allocation in which the total bit rate of all channels including the video and audio channels is variable and will not exceed a constant maximum value in order to assure a recording time as long as possible. On the assumption that the number of bits representing an MDCT coefficient and usable in transmission or recording is 800 bps for example, the tonality of the spectrum information of signal information and over-time change of the signal information are first used to determine the number of bits to be used in a first bit allocation, of the above bits usable in transmission and recording. Also, the ratio of bit sectioning between the first bit allocation pattern and at least another bit allocation to be appended to the first bit allocation pattern depends upon the over-time change characteristic of the information signal. A bit sectioning ratio is determined depending upon how the information signal increase in amplitude in a time domain, in which the information signal suddenly becomes large in amplitude, detected through comparison of peak values of the signal information adjacent block by adjacent block with each other in each time section resulted from subdivision of an orthogonal transform time block size.
With the above high-efficiency coding technique, it is possible to assure a sufficiently long recording time by providing a bit allocation in which the total bit rate of all channels including the video and audio channels is variable and will not exceed a constant maximum value.
However, in the case where the entropy coding is used to reduce the code length in a multi-channel coding system in which information from a plurality of signal channels are encoded together, if a bit allocation is done before encoding the information as in the Japanese Patent Application Laid-pen No. H08-123488, any optimum inter-channel bit allocation cannot simply be calculated because of a data compression ratio in the entropy coding.
Also, once a bit allocation to each channel is fixed in the audio coding system for example, no energy balance between the channels will not be taken in consideration, which leads to an extremely low efficiency of coding.
On the other hand, even if it is tried to calculate the number of bits for allocation to a channel on the basis of the energy proportionality relation in spectral power between blocks, it is difficult to predict the number of entropy codes and assure any appropriate bit allocation because the data compression ratio in the entropy coding varies depending upon the signal property.