This invention relates to a low bit rate encoder and a low bit rate encoding method for compression-encoding audio signals of multi-channel system, a low bit rate decoder and a low bit rate decoding method for decoding compression-coded signals, and recording media on which signals encoded by such encoder/encoding method are recorded, which are used in cinema film projection systems or stereo or multi-sound acoustic systems such as video tape recorder or video disc player, etc.
Various efficient encoding techniques and devices for audio or speech signals, etc. are known.
As an example of an efficient encoding technique, there is a blocking frequency band division system, known as transform coding, for blocking, for example an audio signal, etc. in the time region every time unit to transform signals in the time domain every blocks into signals in the frequency domain (orthogonal transform) and thereafter to divide them into signal components in a plurality of frequency bands to encode those signal components for every respective frequency band.
Moreover, there can be enumerated sub-band coding (SBC) which is a non-blocking frequency band division system in which an audio signal, etc. in the time region is divided into signal components in a plurality of frequency bands without blocking it every unit time thereafter to encode them.
Further, there have been proposed efficient coding techniques and devices in which the sub-band coding and the transform coding described above are combined. In this case, for example, an input signal is divided into signal components in a plurality of frequency bands by the sub-band coding thereafter to orthogonally transform signals for every respective frequency band into signals in the frequency domain to implement coding to these orthogonally transformed signal components in the frequency domain.
Here, as a filter for frequency band division of the above-described sub-band coding, there is, for example, a filter known as QMF, etc. Such filter is described in, e.g. the literature "Digital coding of speech in subbands" R. E. Crochiere, Bell Syst. Tech. J. Vol. 55, No. 8, 1976. This filter of QMF serves to halve the frequency band into bands of equal bandwidth. This filter is characterized in that so called aliasing does not take place in synthesizing the above-mentioned divided frequency bands at later processing stage.
Moreover, in the literature "Polyphase Quadrature filters-A new subband coding technique", Joseph H. Rothweiler ICASSP 83, BOSTON, filter division technique of equal bandwidth is described. This polyphase quadrature filter is characterized in that division can be made at a time in dividing a signal into signal components in a plurality of frequency bands of equal bandwidth.
Further, as the above-described orthogonal transform processing, there is, for example, such an orthogonal transform system to divide an input audio signal into blocks by a predetermined unit time (frame) to carry out Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT), or Modified DCT Transform (MDCT), etc. every respective blocks to thereby transform signals in the time domain into those in the frequency domain.
This MDCT is described in the literature "Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation", J. P. Princen and A. B. Bradley, Univ. of Surrey Royal Melbourne Inst. of Tech. ICASSP 1987.
Further, as frequency division width in the case of encoding (quantizing) respective frequency components divided into frequency bands, there is band division in which, for example, hearing sense characteristic of the human being is taken into consideration. Namely, there are instances where an audio signal is divided into signal components in plural (for example, 25) bands by a bandwidth such that the bandwidth becomes broader according as frequency shifts to higher frequency band side, which is generally called critical band.
In addition, in encoding data every respective bands at this time, coding by a predetermined bit allocation every respective bands or adaptive bit allocation every respective bands is carried out.
For example, in encoding coefficient data obtained after undergoing the MDCT processing by the above-mentioned bit allocation, coding is carried out by adaptive allocation bit number with respect to MDCT coefficient data every respective bands obtained by MDCT processing every respective blocks.
As the bit allocation technique and device therefor, the following two techniques and device are known.
For example, in the literature "Adaptive Transform Coding of Speech Signals", IEEE Transactions of Acoustics, Speech, and Signal Processing, vol. ASSP-25, No. 4, August 1977, bit allocation is carried out on the basis of magnitudes of signals every respective bands.
Moreover, for example, in the literature "The critical band coder--digital encoding of the perceptual requirements of the auditory system", M. A. Kransner MIT, ICASSP 1980, there are described the technique and the device in which necessary signal-to-noise ratios are obtained for every frequency bands by making use of the hearing sense masking to carry out fixed bit allocation.
Meanwhile, for example, in the efficient compression encoding system for audio signals using subband coding, etc. as described above, such a system to compress audio data by making use of the characteristic of the hearing sense of the human being so that its data quantity becomes equal to about 1/5 has been already put into practice.
It should be noted that there is a system called ATRAC (Adaptive Transform Acoustic Coding, a trademark of SONY Corporation) used in, e.g. MD (Mini Disc, a trademark of by SONY Corporation) as the efficient encoding system of compressing audio data so that its data quantity becomes equal to about 1/5.
However, in the efficient coding system utilizing the characteristic of the hearing sense of the human being, there are examples where the sound of a musical instrument or the voice of human being, etc. obtained by compression-coding a speech signal thereafter to decode that coded signal might be changed from original sound, although such a phenomenon takes place to a little degree. Particularly, in the case where this efficient coding system utilizing the characteristic of the hearing sense is used for recording format of recording media for which fidelity reproduction of original sound is required, realization of higher sound quality is required.
On the contrary, a format of such an efficient coding system (ATRAC system), etc. to compress audio signal so that its signal (data) quantity becomes equal to about 1/5 has been already put into practice, and hardware employing such a format is being popularized.
Accordingly, implementation of change or expansion having no compatibility with the format is disadvantageous not only to manufacture (makers) which have used the format but also to general users.
For this reason, it is expected that a device attain high sound quality based on encoding or decoding without changing the format thereof.
As the method of realization of higher sound quality except for the above, it is conceivable to mix linear PCM sound into ordinary compressed data. However, since compressed data of the efficient coding system and linear data are different in length of frame and time length per each frame, it is difficult to provide synchronization at the time of reproduction. Accordingly, it is very difficult to use these data of two formats at the same time.
Furthermore, not only in the case of ordinary audio equipment, but also in, for example cinema film projection systems, high definition television, or stereo or multi-sound acoustic system such as video tape recorder or video disc player, etc. audio signals of plural channels of 4.about.8 channels are being handled. Also in this case, efficient coding to reduce bit rate is expected.
Particularly, with cinema film, there are instances where, for example, digital audio signals, that is, a 8 channels of left channel, left center channel, center channel, right center channel, right channel, surround left channel, surround right channel and sub-woofer channel, are recorded. In this case, the above-mentioned efficient coding to reduce bit rate is required.
In particular, it is difficult to ensure, on the cinema film, the area capable of 8 channels of linearly quantized audio data of sampling frequency of 44.1 kHz and 16 bits as used in so called CD (Compact Disc), etc. Accordingly, compression of the audio data is required.
It should be noted that channels of 8 channel data recorded on the cinema film respectively correspond to left speaker, left center speaker, center speaker, right center speaker, right speaker, surround left speaker, surround right speaker, and sub-woofer speaker, which are disposed on the screen side where, for example, pictures reproduced from the picture recording areas of cinema film are projected by projector.
The center speaker is disposed in the center on the screen side, and serves to output reproduced sound by audio data of center channel. This center speaker outputs the most important reproduced sound, e.g. speech of an actor, etc.
The sub-woofer speaker serves to output reproduced sound by audio data of sub-woofer channel. This sub-woofer speaker effectively outputs sound which is perceived as vibration rather than sound in low frequency band, for example, sound of an explosion, and is frequently used effectively in a scene of explosion.
The left speaker and the right speaker are disposed on left and right sides of the screen, and serve to output reproduced sound by audio data of left channel and reproduced sound by audio data of right channel, respectively. These left and right speakers exhibit stereo sound effect.
The left center speaker is disposed between the left speaker and the center speaker, and the right center speaker is disposed between the center speaker and the right speaker. The left center speaker outputs reproduced sound by audio data of left center channel, and the right center speaker outputs reproduced sound by audio data of right center channel. These left and right center speakers perform auxiliary roles of the left and right speakers, respectively.
Particularly, in movie-theater having large screen and large number of persons to be admitted, etc. there is the drawback that localization of sound image becomes unstable in dependency upon seat positions. However, the above-mentioned left and right center speakers are added to thereby exhibit effect in creating more realistic localization of sound image.
Further, the surround left and right speakers are disposed so as to surround spectator's seats. These surround left and right speakers serve to respectively output reproduced sound by audio data of surround left channel and reproduced sound by audio data of surround right channel, and have the effect to provide reverberation or an impression of being surrounded by hand clapping or a shout of joy. Thus, it is possible to create sound images in a more three-dimensional manner.
In addition, since a defect, etc. is apt to take place on the surface of a medium of cinema film, if digital data is recorded as it is, missing data takes place to a great degree. Such a recording system cannot be employed from a practical point of view. For this reason, the capability of an error correcting code is very important.
Accordingly, with respect to the data compression, it is necessary to carry out compression processing to such a degree that recording can be made in the recording area on the film by taking bits for a correcting code into consideration.
From facts as described above, as the compression method of compressing digital audio data of 8 channels as described above, there is applied the efficient coding system (e.g. the ATRAC system) to attain sound quality comparable to CD by carrying out optimum bit allocation by taking into consideration the characteristic of the hearing sense of the human being as described above.
However, with this efficient coding system, the sound of a general musical instrument or the voice of the human being, etc. is varied from original sound similar to the above, although such a phenomenon takes place to a little degree. For this reason, in the case where such a system is employed in recording format for which reproduction having fidelity to original sound is required, any means for realizing higher sound quality is required.
This problem always exists as long as in the case where systems, except for the above-mentioned efficient coding system is used as multi-channel recording format in the cinema film, an irreversible compression system is employed from a viewpoint of ensuring the recording area.
Moreover, in the system of implementing efficient coding to audio signals of multi-channel systems as described above, data of respective channels are independently caused to undergo compression processing.
For this reason, even if, for example, a certain one channel is in unvoiced sound state, a fixed bit (byte) allocation amount is allocated to that channel.
Giving a fixed bit allocation amount to the channel in unvoiced sound state as stated above is redundant.
Moreover, since bit allocation amounts are the same also with respect to channel of signal of low level and channel of signal of high level, if bit allocation amounts are evaluated over respective channels, redundant bits exist.
It is considered that particularly in the case where bit allocation amounts are fixed every respective channels, redundancy as described above becomes more conspicuous.
The present Assignee proposed in JP Patent Application No. 6-206702 (not laid open as yet) corresponding to U.S. patent application Ser. No. 08/327,282 a technique of determining channel bit allocation based upon the amplitude information to the respective channels or time changes of the sum of scale factors.
The present Assignee already proposed in PCT/J 94/00880 (International Publication Bo. W094/28633, date of international publication, Dec. 8, 1994, a technique of separating the input acoustic signal into tonal components having the energy concentrated in a specific frequency and components having the energy smoothly distributed in a broad frequency range, that is noisy or non-tonal components, and encoding the respective components for achieving a high encoding efficiency.