MPEG-2 AAC (Advanced Audio Coding) which is an international standard process of ISO/IEC is widely known as an audio coding/decoding process for coding an audio signal with high sound quality at a low bit rate. According to conventional audio coding/decoding processes that are typified by the MPEG-2 AAC, a plurality of samples from a time-domain PCM signal are put together into a frame, which is converted into a frequency-domain signal by a mapping transform such as MDCT (Modified Discrete Cosine Transform). The frequency-domain signal is then quantized and subjected to Huffman coding to produce a bit stream. For quantizing the frequency-domain signal, in view of the hearing characteristics of the human being, the quantizing accuracy is increased for more perceptible frequency components of the frequency-domain signal and reduced for less perceptible frequency components of the frequency-domain signal, thus achieving a high sound-quality level with a limited amount of coding. For example, a bit rate of about 96 kbps according to the MPEG-2 AAC can provide the same sound-quality level (at a sampling frequency of 44.1 kHz for a stereophonic signal) as CDs.
If a stereophonic signal sampled at a sampling frequency of 44.1 kHz is coded at a lower bit rate, e.g., a bit rate of about 48 kbps, then efforts are made to maximize the subjective sound quality at the limited bit rate by not coding high-frequency components that are of less auditory importance, i.e., by setting their quantized values to zero. However, since the high-frequency components are not coded, the sound-quality level is deteriorated, and the reproduced sound is generally of muffled nature.
Attention has been drawn to the band expansion technology for solving the problem of the sound quality deterioration at low bit rates. According to the band expansion technology, a high-frequency bit stream as auxiliary information in a slight amount of coding (generally several kbps) is added to a low-frequency bit stream representative of an audio signal that has been coded at a low bit rate by a coding process such as the MPEG-2 AAC process or the like, thus producing a combined bit stream. The combined bit stream is decoded by an audio decoder as follows: The audio decoder decodes the low-frequency bit stream according to a decoding process such as the MPEG-2 AAC process or the like, producing a low-frequency audio signal that is free of high-frequency components. The audio decoder then processes the low-frequency audio signal based on the auxiliary information represented by the high-frequency bit stream according to the band expansion technology, thus generating high-frequency components. The high-frequency components thus generated and the low-frequency audio signal produced by decoding the low-frequency bit stream are combined into a decoded audio signal that contains the high-frequency components.
One example of a conventional audio decoder based on the band expansion technology is a combination of an MPEG-2 AAC decoder and a band expansion technology called SBR as described in document 1, section 5.6 shown below. FIG. 1 of the accompanying drawings illustrates a conventional audio decoder based on the band expansion technology described in document 1.
Document 1: “Digital Radio Mondiale (DRM); System Specification” (ETSI TS 101 980 V1. 1.1), published September, 2001, p. 42-57.
The conventional audio decoder shown in FIG. 1 comprises bit stream separator 100, low-frequency decoder 101, subband divider 402, complex band expander 403, and complex subband combiner 404.
Bit stream separator 100 separates an input bit stream and outputs separated bit streams to low-frequency decoder 101 and complex band expander 403. Specifically, the input bit stream comprises a multiplexed combination of a low-frequency bit stream representing a low-frequency signal that has been coded by a coding process such as the MPEG-2 AAC process and a high-frequency bit stream including information that is required for complex band expander 403 to generate a high-frequency signal. The low-frequency bit stream is output to low-frequency decoder 101, and the high-frequency bit stream is output to complex band expander 403.
Low-frequency decoder 101 decodes the input low-frequency bit stream into a low-frequency audio signal, and outputs the low-frequency audio signal to subband divider 402. Low-frequency decoder 101 decodes the input low-frequency bit stream according to an existing audio decoding process such as the MPEG-2 AAC process or the like.
Subband divider 402 has a complex subband dividing filter that divides the input low-frequency bit stream into a plurality of low-frequency subband signals in respective frequency bands, which are output to complex band expander 403 and complex subband combiner 404. The complex subband dividing filter may comprise a 32-band complex QMF (Quadrature Mirror Filter) bank which has heretofore been widely known in the art. The complex low-frequency subband signals divided in the respective 32 subbands are output to complex band expander 403 and complex subband combiner 404. The 32-band complex QMF bank processes the input low-frequency bit stream according to the following equation:
                                                        X              k                        ⁡                          (              m              )                                =                                    ∑                              n                =                                  -                  ∞                                            ∞                        ⁢                                          h                ⁡                                  (                                                            m                      ⁢                                                                                          ⁢                      M                                        -                    n                                    )                                            ⁢                                                          ⁢                              x                ⁡                                  (                  n                  )                                            ⁢                                                          ⁢                              W                                  K                  ⁢                                                                          ⁢                  1                                                                      -                                          (                                              k                        +                                                  k                          0                                                                    )                                                        ⁢                                      (                                          n                      +                                              n                        0                                                              )                                                                                      ,                    402.1                                    k          =          0                ,        1        ,        …        ⁢                                  ,                              K            ⁢                                                  ⁢            1                    -          1                                                                              W                      K            ⁢                                                  ⁢            1                          =                  ⅇ                      j            ⁢                                                  ⁢                                          2                ⁢                                                                  ⁢                π                                            K                ⁢                                                                  ⁢                1                                                                402.2      where x(n) represents the low-frequency audio signal, Xk(m) the kth-band low-frequency subband signal, and h(n) the analytic low-pass filter. In this example, K1=64.
Complex band expander 403 generates a high-frequency subband signal representing a high-frequency audio signal from the high-frequency bit stream and the low-frequency subband signals that have been input thereto, and outputs the generated high-frequency subband signal to complex subband combiner 404. As shown in FIG. 2 of the accompanying drawings, complex band expander 403 comprises complex high-frequency generator 500 and complex amplitude adjuster 501. Complex band expander 403 is supplied with the high-frequency bit stream from input terminal 502 and with the low-frequency subband signals from input terminal 504, and outputs the high-frequency subband signal from output terminal 503.
Complex high-frequency generator 500 is supplied with the low-frequency subband signals and the high-frequency bit stream, and copies the signal in the subband that is specified among the low-frequency subband signals by the high-frequency bit stream, to a high-frequency subband. When copying the signal, complex high-frequency generator 500 may perform a signal processing process specified by the high-frequency bit stream. For example, it is assumed that there are 64 subbands ranging from subband 0 to subband 63 in the ascending order of frequencies, and complex subband signals from subband 0 to subband 19, of those 64 subbands, are supplied as the low-frequency subband signals to input terminal 504. It is also assumed that the high-frequency bit stream contains copying information indicative of which one of the low-frequency subbands (subband 0 to subband 19) a signal is to be copied from to generate a subband A (A>19), and signal processing information representing a signal processing process (selected from a plurality of processes including a filtering process) to be performed on the signal. In complex high-frequency generator 500, a complex-valued signal in a high-frequency subband (referred to as “copied/processed subband signal”) is identical to a complex-valued signal in a low-frequency subband indicated by the copying information. If the signal processing information indicates any signal processing need for better sound quality, then complex high-frequency generator 500 performs the signal processing process indicated by the signal processing information on the copied/processed subband signal. The copied/processed subband signal thus generated is output to complex amplitude adjuster 501.
One example of signal processing performed by complex high-frequency generator 500 is a linear predictive inverse filter that is generally well known for audio coding. Generally, it is known that the filter coefficients of a linear predictive inverse filter can be calculated by linearly predicting an input signal, and the linear predictive inverse filter using the filter coefficients operate to whiten the spectral characteristics of the input signal. The reason why the linear predictive inverse filter is used for signal processing is to make the spectral characteristics of the high-frequency subband signal flatter than the spectral characteristics of the low-frequency subband signal from which it is copied. A comparison between the spectral characteristics of low- and high-frequency subband signals of an audio signal, for example, indicates that the spectral characteristics of the high-frequency subband signal are often flatter than the spectral characteristics of the low-frequency subband signal. Therefore, a high-quality band expansion technology can be realized by using the above flattening technique.
Complex amplitude adjuster 501 performs a correction specified by the high-frequency bit stream on the amplitude of the input copied/processed subband signal, generating a high-frequency subband signal. Specifically, complex amplitude adjuster 501 performs an amplitude correction on the copied/processed subband signal in order to equalize the signal energy (referred to as “target energy”) of high-frequency components of the input signal on the coding side and the high-frequency signal energy of the signal generated by complex band expander 403 with each other. The high-frequency bit stream contains information representative of the target energy. The generated high-frequency subband signal is output to output terminal 503. The target energy described by the high-frequency bit stream may be considered as being calculated in the unit of a frame for each subband, for example. Alternatively, in view of the characteristics in the time and frequency directions of the input signal, the target energy may be calculated in the unit of a time divided from a frame with respect to the time direction and in the unit of a band made up of a plurality of subbands with respect to the frequency direction. If the target energy is calculated in the unit of a time divided from a frame with respect to the time direction, then time-dependent changes in the energy can be expressed in further detail. If the target energy is calculated in the unit of a band made up of a plurality of subbands with respect to the frequency direction, then the number of bits required to code the target energy can be reduced. The unit of divisions in the time and frequency directions used for calculating the target energy is represented by a time frequency grid, and its information is described by the high-frequency bit stream.
According to another arrangement of complex amplitude adjuster 501, an additional signal is added to the copied/processed subband signal, generating a high-frequency subband signal. The amplitude of the copied/processed subband signal and the amplitude of the additional signal are adjusted such that the energy of the high-frequency subband signal serves as a target energy. An example of the additional signal is a noise signal or a tone signal. Gains for adjusting the amplitudes of the copied/processed subband signal and the additional signal, on the assumption that either one of the copied/processed subband signal and the additional signal serves as a main component of the generated high-frequency subband signal, and the other as an auxiliary component thereof, are calculated as follows: If the copied/processed subband signal serves as a main component of the generated high-frequency subband signal, thenGmain=sqrt(R/E/(1+Q))Gsub=sqrt(R×Q/N/(1+Q))where Gmain represents the gain for adjusting the amplitude of the main component, Gsub the gain for adjusting the amplitude of the auxiliary component, and E, N the respective energies of the copied/processed subband signal and the additional signal. If the energy of the additional signal is normalized to 1, then N=1. In the above equations, R represents the target energy, Q the ratio of the energies of the main and auxiliary components, R, Q being described by the high-frequency bit stream, and sqrt( ) the square root. If the additional signal serves as a main component of the generated high-frequency subband signal, thenGmain=sqrt(R/N/(1+Q))Gsub=sqrt(R×Q/E/(1+Q))
The high-frequency subband signal can be calculated by weighting the copied/processed subband signal and the additional signal using the amplitude adjusting gains thus calculated and adding the copied/processed subband signal and the additional signal which are thus weighted.
Operation of complex amplitude adjuster 501 for amplitude adjustment and advantages thereof will be described in detail with reference to FIG. 3. The signal phase (phase A in FIG. 3) of high-frequency components of the input signal on the coding side and the signal phase (phase B in FIG. 3) of the high-frequency subband signal derived from the low-frequency subband signal are entirely different from each other as shown in FIG. 3. However, since the amplitude of the high-frequency subband signal is adjusted such that its signal energy is equalized to the target energy, the sound quality as it is heard is prevented from being degraded. This is because the human auditory sense is more sensitive to signal energy variations than to signal phase variations.
Complex subband combiner 404 has a complex subband combining filter that combines the bands of the low-frequency subband signal and the high-frequency subband signal that have been input thereto. An audio signal generated by combining the bands is output from the audio decoder. The complex subband combining filter that is used corresponds to the complex subband dividing filter used in subband divider 402. That is, these filters are selected such that a certain signal is divided by a complex subband dividing filter into subband signals, which are combined by a complex subband combining filter to fully reconstruct the original signal (the signal input to the complex subband dividing filter). For example, if the 32-band complex QMF dividing filter bank (K1=64) represented by the equation 402.1 is used as the complex subband combining filter, then the following equation 404.1 can be employed:
                              x          ⁡                      (            n            )                          =                              ∑                          m              =                              -                ∞                                      ∞                    ⁢                                    f              ⁡                              (                                  n                  -                                      m                    ⁢                                                                                  ⁢                    M                                                  )                                      ⁢                                                  ⁢                          1                              K                ⁢                                                                  ⁢                2                                      ⁢                                                  ⁢                                          ∑                                  k                  =                  0                                                                      K                    ⁢                                                                                  ⁢                    2                                    -                  1                                            ⁢                                                                    X                    k                                    ⁡                                      (                    m                    )                                                  ⁢                                                                  ⁢                                  W                                      K                    ⁢                                                                                  ⁢                    2                                                                              (                                              k                        +                                                  k                          0                                                                    )                                        ⁢                                          (                                              n                        +                                                  n                          0                                                                    )                                                                                                                      404.1      where f(n) represents the combining low-pass filter. In this example, K2=64.
If the sampling frequency for the audio signal output from complex subband combiner 404 is higher than the sampling frequency for the audio signal output from low-frequency decoder 101 according to the band expansion technology, then the filters are selected such that a low-frequency part (down-sampled result) of the audio signal output from complex subband combiner 404 is equal to the audio signal output from low-frequency decoder 101. Complex subband combiner 404 may employ a 64-band complex QMF combining filter bank (K2=128 in the equation 404.1). In this case, the lower-frequency 32 bands employ the output of a 32-band complex QMF combining filter bank as a signal value.
The conventional audio decoder has been problematic in that it has a subband divider and a complex subband combiner which require a large amount of calculations, and the required amount of calculations and the apparatus scale are large because the band expansion process is carried out using complex numbers.