1. Field of the Invention
The present invention relates to a signal compression and encoding apparatus for compressing and multiplexing, for example, audio or video signals and a compressed signal decoding apparatus for decoding the compressed and multiplexed audio or video signals. The present invention also relates to a system including these apparatuses.
2. Description of the Related Art
To transmit/record digitized audio, acoustic signals requires a large volume of data, so that a huge transmission or storage capacity is required. Therefore, there has been a strong demand for a technique for compressing such data.
Typically the human perceptual system does not receive all stimulation equally, so that only a part of stimulation record at one time is received while the remaining parts are not perceived. Discrimination of the perceived component and non-perceived component is known to be made effectively in frequency areas. With this in mind, a compression algorithm of high efficiency using this principle has been developed. More specifically, a time sample sequence is converted into a frequency sample sequence by using an orthogonal transform technique. The current compression method compresses data in this frequency sample sequence. Examples of the transform in time/frequency include Discrete Cosine Transform (DCT) and Modified Discrete Cosine Transform (MDCT). The DCT algorithm is similar to Fast Fourier Transform (FFT) that converts data into sets of frequencies. In addition, subband filters where a large number of bandpass filters are aligned are also known.
One technique for compressing and encoding audio data is developed by the Moving Pictures Experts Group (MPEG), International Standards Organization (ISO) standard for compressing video (ISO/IEC-11172-3). FIG. 11 is a simple block diagram associated with this MPEG encoding. Input signals are audio digital signals sampled typically at sampling frequencies of 48 kHz, 44.1 kHz, or 32 kHz. Output can be selected for bit-streams of from 32 kbps to 448 kbps. Three layers, Layers I, II, and III, are defined according to compression efficiencies. Layers I and II based on the same fundamental algorithm are now described.
Input is supplied to a block 1 called mapping and then supplied to a subband filter group which segments a frequency region into thirty-two samples. The subband filter output is subjected to stripping to strip one sample by thirty-two samples. The subband filter ensures that the number of samples before passing through the subband filter is the same as that after passing therethrough without any change of the number of samples (i.e., the samples are critically sampled) and that the original time sample sequence can be de-formatted completely by this inverse transform. Accordingly, it can be understood that the input signal is transformed into a frequency sample sequence by the certain number of time samples in the approach using the subband filters, as in the case using the orthogonal transform technique.
As the mapping block 1, the input signal is supplied to a psychoacoustical model block 2. In this block 2, a frequency spectrum is obtained out of the input sample sequence by means of transformation in time/frequency (e.g., FFT) based on a certain frame independently of the mapping block 1. A length of this frame corresponds to 12 blocks (Layer I) or 36 blocks (Layer II) of the transform block of the above mentioned subband filter. In various schemes without using the subband filter, an orthogonal transform results in a large output. Accordingly, a frequency spectrum of this output may be used for calculation of the psychoacoustical model.
A signal to mask ratio (SMR) is then obtained between a signal level (S) of each subband, calculated out of the frequency spectrum by using the psychoacoustical model, and a mask level (M) which is not recognized acoustically due to masking effects. A quantization and coding block 3 resolves the output sample of the mapping into a product of the signal level (S) and a sample value (D(i)), namely, EQU OUTPUT SAMPLE=S * D(i).
The signal level (S) is referred to as a scale factor indicative of the level into which a sample of the maximum value is classified, within the subband of the frame. The frequency samples are quantized with a bit number obtained according to SMR into D'(i). The bit number is allocated to the frequency sample, and this process is referred to as a bit allocation. A quantization noise (N) or an artifact generated as a result of quantization will not be recognized even with the smaller bit number if this noise is lower than the masking level. The data is thus compressed.
The bit allocation, the scale factor, and the sample data are formatted into a bit stream sequence having a frame structure by a frame packing block 4. At the decoding side, this bit stream is received and frame synchronization is achieved by a frame unpack block 5. The bit allocation, scale factor, and sample data are separated from each other for extraction. The frequency sample sequence is then reproduced by a quantization decoding block 6 and the time sample sequence is reconstituted by an inverse mapping block 7. As a result, a reproduced acoustic signal is obtained.
FIG. 12 shows an exemplified structure of a bit stream. A head of one frame is a header 21 having information about frame synchronization codes and other information regarding, for example, mode. This is followed by the bit allocation 22, the scale factor 23, and the sample data 24. A supplementary data may follow thereafter. One frame comprises one or more blocks for a time-to-frequency transform.
With a system for transmitting and storing audio signals associated with video images, the acoustic and audio signals should be in the form of two or more channels when two or more different languages are used. More specifically, the acoustic signal common for all languages and channels for respective languages are transmitted or stored. At the receiving or reproducing side, the common channel and one language are selected, which are summed and produced. Such "multi-channeled" acoustic signal may be compressed into bit streams by using one of following two techniques.
The first method is the one to compress all channels into a single bit stream. For MPEG, this approach has been realized in a new process of standardization as MPEG 2.
This method is, however, disadvantageous when the number of languages is increased. A bit rate of the bit stream (bit per second; bps) is increased to ensure that the sound has a significant quality. The bit rate (bps) of the compressed bit stream has certain limitations in a case where compression of two channels is fundamental as in MPEG. Such limitations are required to avoid elongated time for frame synchronization even with one audio frame has the increased number of bits in a two-channel mode. The limitation also ensures that the number of input buffer registers is not changed with the increased bit rate of the compressed bit stream during decoding operation. As apparent from the above, the standardized bit stream limits the number of channels available.
The second method is to construct a plurality of bit stream systems with signals on two or more channels being separate bit streams. This allows a system for many languages (channels) with the standard bit stream. In this event, only the bit stream on the desired channel is decoded at the decoding side. It is, however, necessary to decode separately the bit streams on the common acoustic channel and the channel for the selected language, increasing the amount of processing by two.
FIG. 13 shows an example of such multi-channel system. In this event, each audio input may be monaural or be on multiple channels i.e., in stereo. In this system, a number of audio input signals are first subjected to audio compression/encoding by a number of audio encoders 31, 32, 33, 34. Bit streams are each multiplexed by a bit stream multiplexer/formatter 41 into a system bit stream. A video signal may be multiplexed, if the system include video signals. In addition, system information may also be multiplexed that is required for synchronization between the video and audio signals.
At the reproduction side, the video and audio bit streams and the system information are separated from each other by a demultiplexer 42. Processing of audio is made by means of selecting a desired bit stream from the reproduced audio bit streams and decoding it by using audio decoders 43 and 44. If a plurality of audio bit streams are selected, the audio decoders 43 and 44 decode compressed audio bit streams separately and independently. The decoded audio signals are mixed by a mixer/summer 45 and then produced.
As described above, conventionally, when a plurality of bit streams are formed for input signals of plural systems, such as audio and acoustic signals to multiplex them, the multiplexed bit streams should each be decoded by using separate systems upon reproduction thereof. It is thus necessary to provide a plurality of decoding circuits having the same structure and function, disadvantageously increasing the scale and the dimension of the resultant circuit.