There are a variety of techniques for high efficiency encoding of audio or speech signals. Examples of these techniques include transform coding in which a frame of digital signals representing the audio signal on the time axis is converted by an orthogonal transform into a block of spectral coefficients representing the audio signal on the frequency axis, and a sub-band coding in which the frequency band of the audio signal is divided by a filter bank into a plurality of sub-bands without forming the signal into frames along the time axis prior to coding. There is also known a combination of sub-band coding and transform coding, in which digital signals representing the audio signal are divided into a plurality of frequency ranges by sub-band coding, and transform coding is applied to each of the frequency ranges.
Among the filters for dividing a frequency spectrum into a plurality of equal-width frequency ranges include the quadrature mirror filter (QMF) as discussed in R. E. Crochiere, Digital Coding of Speech in Sub-bands, 55 Bell Syst. Tech J. No. 8 (1976). With such QMF filter, the frequency spectrum of the signal is divided into two equal-width bands. With the QMF, aliasing is not produced when the frequency bands resulting from the division are subsequently combined together.
In "Polyphase Quadrature Filters--A New Subband Coding Technique", Joseph H. Rothweiler ICASSP 83, Boston, there is shown a technique of dividing the frequency spectrum of the signal into equal-width frequency bands. With the present polyphase QMF, the frequency spectrum of the signals can be divided at a time into plural equal-width frequency bands.
There is also known a technique of orthogonal transform including dividing the digital input audio signal into frames of a predetermined time duration, and processing the resulting frames using a discrete Fourier transform (DFT), discrete cosine transform (DCT) and modified DCT (MDCT) for converting the signal from the time axis to the frequency axis. Discussions on MDCT may be found in J. P. Princen and A. B. Bradley, Subband Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellation", ICASSP 1987.
By quantizing the signals divided on the band basis by the filter or orthogonal transform, it becomes possible to control the band subjected to quantization noise and psychoacoustically more efficient coding may be performed by utilizing the so-called masking effects. If the signal components are normalized from band to band with the maximum value of the absolute values of the signal components, it becomes possible to effect more efficient coding.
In a technique of quantizing the spectral coefficients resulting from an orthogonal transform, it is known to use sub bands that take advantage of the psychoacoustic characteristics of the human auditory system. That is, spectral coefficients representing an audio signal on the frequency axis may be divided into a plurality of critical frequency bands. The width of the critical bands increase with increasing frequency. Normally, about 25 critical bands are used to cover the audio frequency spectrum of 0 Hz to 20 kHz. In such a quantizing system, bits are adaptively allocated among the various critical bands. For example, when applying adaptive bit allocation to the spectral coefficient data resulting from MDCT, the spectral coefficient data generated by the MDCT within each of the critical bands is quantized using an adaptively allocated number of bits.
There are presently known the following two bit allocation techniques. For example, in IEEE Transactions of Acoustics, Speech and Signal Processing, vol. ASSP-25, No. 4, August 1977, bit allocation is carried out on the basis of the amplitude of the signal in each critical band. This technique produces a flat quantization noise spectrum and minimizes the noise energy, but the noise level perceived by the listener is not optimum because the technique does not effectively exploit the psychoacoustic masking effect.
In the bit allocation technique described in M. A. Krassner, The Critical Band Encoder-Digital Encoding of the Perceptual Requirements of the Auditory System, ICASSP 1980, the psychoacoustic masking mechanism is used to determine a fixed bit allocation that produces the necessary signal-to-noise ratio for each critical band. However, if the signal-to-noise ratio of such a system is measured using a strongly tonal signal, for example, a 1 kHz sine wave, non-optimum results are obtained because of the fixed allocation of bits among the critical bands.
For overcoming these inconveniences, a high efficiency encoding apparatus has been proposed in which the total number of bits available for bit allocation is divided between a fixed bit allocation pattern pre-set for each small block and a block-based signal magnitude dependent bit allocation, and the division ratio is set in dependence upon a signal which is relevant to the input signal such that the smoother the signal spectrum, the higher becomes the division ratio for the fixed bit allocation pattern.
With this technique, if the energy is concentrated in a particular spectral component, as in the case of a sine wave input, a larger number of bits are allocated to the block containing the spectral component, for significantly improving the signal-to-noise characteristics in their entirety. Since the human auditory system is highly sensitive to a signal having acute spectral components, this technique may be employed for improving the signal-to-noise ratio for improving not only measured values but also the quality of the sound as perceived by the ear.
In addition to the above techniques, a variety of other techniques have been proposed, and the model simulating the human auditory system has been refined, such that, if the encoding device is improved in its ability, encoding may be made with higher efficiency in consideration of the human auditory system.
With the above-described conventional methods, the bandwidth for which frequency components are quantized is fixed, so that, if spectral components are concentrated in the vicinity of several specified frequencies, and these spectral components have to be quantized with a sufficient number of quantization steps, a larger number of bits need to be allocated to spectral components belonging to the same band as that of the spectral components concentrated in a few frequencies, resulting in the lower efficiency.
In general, the noise contained in tonal acoustic signals in which the energy of spectral components is concentrated in a particular frequency proves a serious obstruction to the hearing sense in that it is more readily heard by the ears than the noise added to acoustic signals whose energy is smoothly distributed over a broad frequency range. In addition, if spectral components having a large energy, that is tonal components, are not quantized with sufficient quantization steps, frame-to-frame distortion becomes significant when these spectral components are restored into waveform signals on the time axis so as to be synthesized with forward and backward frames. That is, significant connection distortion occurs when the waveform signal on the time axis is combined with the waveform signal of adjacent frames. The result is again the serious obstruction to ears. Thus it has been difficult with the conventional method to improve the encoding efficiency for the tonal components without deteriorating the sound quality.
The present Assignee already proposed in PCT/J 94/00880 (International Publication No. WO94/28633, date of international publication, Dec. 8, 1994 a technique of separating the input acoustic signal into tonal components having the energy concentrated in a specific frequency component and components having the energy smoothly distributed in a broader frequency range, that is noisy or non-tonal components, and encoding the respective components for achieving a high encoding efficiency.
With the previously proposed method, the input audio signal is transformed into frequency-domain components which are then grouped in, for example, critical bands. The spectral components are then divided into tonal components and noisy or non-tonal components. The tonal components, that is spectral components within an extremely narrow range on the frequency spectrum where the tonal components exist are encoded with high efficiency by normalization and quantization. The above-mentioned extremely narrow range on the frequency axis where the tonal components encoded with high efficiency encoding exist may be exemplified by a range consisting of a pre-set number of spectral components which themselves are tonal components and are centered about a spectral component having a locally maximum energy.
FIG. 1 shows a configuration of an encoder for adaptively encoding tonal components and noise components separated from the spectral components of audio signals.
In FIG. 1, an audio waveform signal is fed to a terminal 600. The audio waveform signal is converted by a transform circuit 601 into signal frequency components which are fed to a signal component separating circuit 602.
The signal component separating circuit 602 separates the signal frequency components from the transform circuit 601 into tonal components having a steep spectral distribution and other signal frequency components, that is noise components having a flatter spectral distribution. Of the frequency components, the tonal components having the steep spectral distribution and the other signal frequency components, that is noise components having a flatter spectral distribution, are encoded by normalization and quantization by a tonal component encoding circuit 603 and by a noise component encoding circuit 604, respectively.
Outputs of the tonal component encoding circuit 603 and the noise component encoding circuit 604 are converted by a code string generating circuit 605 into a code string which is outputted at an output terminal 607. The code string generating circuit 605 appends the number of information data of the tonal components supplied from the signal component separating circuit 602 and the position information thereof to the code string.
An output signal of the output terminal 607 is added to by an error correction code by an ECC encoder and modulated by eight-to-fourteen (8-14) modulation before being recorded by a recording head on e.g., a disc-shaped recording medium or a motion picture film.
FIG. 2 shows a decoder as a counterpart of the encoder shown in FIG. 1
Referring to FIG. 2, a code string reproduced from a recording medium, such as a disc-shaped recording medium or a motion picture film, not shown, by a playback head, demodulated and corrected for errors, is supplied to an input terminal 700.
The code string, thus supplied to the input terminal 700, is supplied to a code string resolving circuit 701, which then recognizes, based upon the number of information data of the tonal components in the error-corrected code string, which portion of the code string is the tonal component code, and separates the input code string into a tonal component code portion and a noise component code portion. Also the code string separating circuit 701 separates the position information of the tonal components from the input code string and outputs the position information to a downstream side synthesizing circuit 704.
The tonal component code portion and the noise component code portion are fed to a tonal component decoding circuit 702 and a noise component decoding circuit 703 so as to be dequantized and denormalized by way of decoding. Decoded signals from the tonal component decoding circuit 702 and a noise component decoding circuit 703 are routed to a synthesis circuit 704 which effects synthesis as a counterpart operation of the separation by the signal component separating circuit 602 of FIG. 1.
The synthesis circuit 704 adds the decoded signal of the tonal component at a pre-set position of the decoded signal of the noise component based upon the position information of the tonal component supplied from the code string separating circuit 701 for synthesizing the noise component and the tonal component on the frequency axis.
The synthesized decoded signal is transformed by an inverse transform circuit 705 which effects an inverse operation to that of the transform circuit 601 of FIG. 1 so as to be restored from the frequency axis to the time axis. An output waveform signal is outputted at a terminal 707.
FIG. 3 shows an illustrative configuration of the transform circuit 601 of FIG. 1.
Referring to FIG. 3, a signal supplied via a terminal 300, that is a signal via terminal 600 of FIG. 1, is split into three bands by two-stage band-dividing filters 301, 302. The signal via a terminal 300 is thinned by the band-dividing filter 301 by 1/2, while the signal thus thinned by 1/2 by the band-dividing filter 301 is further thinned by 1/2 by the band-dividing filter 302 (the signal at terminal 300 is thinned by 1/4). That is, the bandwidth of two signals from the band-dividing filter 302 is one-fourth that of the signal at the terminal 300.
The signals of the three bands from the band-dividing filters 301, 302 are converted into spectral signal components by forward orthogonal transform circuits 303, 304 and 305, such as MDCT circuits. Outputs of these transform circuits 303, 304, 305 are fed via terminals 306, 307, 308 to the first signal component separating circuit 602.
FIG. 4 shows the basic configuration of the tonal component encoding circuit 603 and the noise component encoding circuit 604 of FIG. 1. These circuits are collectively termed signal component encoding circuits 603, 604.
Referring to FIG. 4, an output of the signal component separating circuit 602 of FIG. 1, fed to a terminal 310, is normalized by a normalization circuit 311 from one pres-set band to another and thence supplied to a quantization circuit 313. For normalization, a scale factor is determined for each pre-set band of the frequency components (termed herein an encoding unit since it is a unit of encoding). The scale factor is set so as to be equal to the amplitude of the maximum sample (frequency component) in the encoding unit and each of the entire samples in the encoding unit of the band is divided by the scale factor by way of normalization. The signal supplied to the terminal 310 is also fed to a quantization step decision circuit 312.
The quantization circuit 313 quantizes the signal from the normalization circuit 311 based upon the quantization step information calculated by the quantization step decision circuit 312. An output of the normalization circuit 311 is outputted at a terminal 314 and thence supplied to the code string generating circuit 605 of FIG. 1. In the output signal at the terminal 314, there are contained, in addition to the signal components quantized by the quantization circuit 313, the normalization coefficient information at the normalization circuit 311 and the quantization step information at the quantization step decision circuit 312.
FIG. 5 shows an illustrative configuration of the inverse transform circuit 705 of FIG. 2.
The configuration of FIG. 5 corresponds to the configuration of the circuit shown in FIG. 3. The signals supplied from the synthesis circuit 704 of FIG. 2 via terminals 501, 502 and 503 are transformed by inverse orthogonal transform circuits 504, 505, 506 which perform an operation reversed from the forward orthogonal transform shown in FIG. 3. The signals of the respective bands, obtained by the inverse orthogonal transform circuits 504, 505, 506, are synthesized by two-stage band-synthesizing filters.
That is, outputs of the inverse orthogonal transform circuits 505, 506 are sent to and synthesized by a band-synthesizing filter 507, an output of which is synthesized by a band synthesizing filter 508 with an output of the inverse orthogonal transform circuit 508. An output of the band synthesizing filter 508 is outputted at a terminal 509 (terminal 707 of FIG. 2).
In a majority of cases, acoustic signals are processed as plural-channel signals. Referring to FIG. 6, a configuration of encoding plural channel signals is explained.
Referring to FIG. 6, audio signals of plural channels (ch.sub.1, ch.sub.2, . . . , ch.sub.n) are fed via input terminals 30.sub.1 to 30.sub.n associated with respective channels to sampling and quantization units, that is analog/digital converters 31.sub.1 to 31.sub.n similarly associated with respective channels. These sampling and quantization units 31.sub.1 to 31.sub.n convert the audio signals of the respective channels into quantized signals. The quantized signals from these sampling and quantization units 31.sub.1 to 31.sub.n are routed to encoding units 32.sub.1 to 32.sub.n. The signals encoded by the encoding units 32.sub.1 to 32.sub.n are routed to a formatter 33 which then assembles the encoded plural-channel signals into a bitstream for transmission or recording on a recording medium in accordance with a pre-set format. The bitstream is outputted at an output terminal 34 so as to be recorded on a recording medium or transmitted.
FIG. 7 shows a configuration of a decoder for decoding the encoded multi-channel signals.
Referring to FIG. 7, the encoded signals, reproduced from the recording medium or transmitted, are routed via an input terminal 40 to a deformatter 41. The deformatter 41 resolves the bitstream supplied thereto into channel-based encoded signals in accordance with a pre-set format. The channel-based encoded signals are routed to decoding units 42.sub.1 to 42.sub.n associated with respective channels.
These decoding units 42.sub.1 to 42.sub.n decode the channel-based encoded signals. The signals decoded by the decoding units 42.sub.1 to 42.sub.n are converted into analog signals by D/A converters 43.sub.1 to 43.sub.n. These analog signals are outputted at associated output terminals 44.sub.1 to 44.sub.n as decoded signals of channels ch.sub.1 to ch.sub.n.
There exist a number of encoding methods for encoding plural-channel signals in addition to the encoding method explained with reference to FIG. 6. For example, there is disclosed in JP Patent Kokai Publication JP-A-4-360331 a method for efficient compression of sub-band signals of left subband and right subband signals of stereo signals (2-channel signals) by exploiting characteristics of the human hearing mechanism that the waveform of monaural signal instead of its phase difference plays an important role. There is also disclosed in International Publication Number WO92/12607 a technique of encoding and decoding subbands of signals representing a sound field in connection with recording, transmission and reproduction of a multi-dimensional sound field intended to be heard by the hearer. The decoded signals of these subbands are transported by multiplexed individual signals or synthesized signals along with a control signal transmitting the relative level of the encoded signals or the definite azimuth of the sound field represented by the encoded signal. These techniques compress the signals using characteristics among respective channels.
If the above-described technique of converting the signal into frequency components and separating the resulting frequency components into tonal components and noise components for encoding can be applied to encoding of multi-channel signals using characteristics across the respective channels, the information volume can be compressed further in recording or transmitting multi-channel signals on a recording medium of limited recording capacity or on a transmission medium of limited transmission capacity. However, there lacks up to now a concrete proposal in connection with such technique.
In view of the foregoing, it is an object of the present invention to provide a signal encoding method and apparatus whereby the date volume in encoding plural-channel signals may be diminished while decoded signals may be prohibited from being deteriorated, a corresponding signal decoding method and apparatus, a recording medium on which encoded signals are recorded, and a method for transmitting the encoded signals.