The method of utilizing band division encoding is widely known as a technology capable of encoding an ordinary acoustic signal with a small information amount, and yet obtaining a reproduction signal with a high quality. As a representative example of the encoding utilizing such a band division, there exists MPEG-2AAC (Moving Experts Group 2 Advance Audio Coding), being ISO/IEC International Standard, in which a wide-band stereo signal of 16 kHz or more can be encoded in a bit rate of 96 kbps or so at a high quality.
However, in a case of having lowered the bit rate, for example, to an extent of 48 kbps, the band enabling the acoustic signal to be encoded at a high quality becomes 10 kHz or so, or less, and the sound is reproduced of which a high-frequency-band signal component is subjectively insufficient in an auditory sense. As a method of compensating a deterioration of a sound quality due to such a band restriction, there exists, for example, the technology described in Non-patent document 1, which is called SBR (Spectral Band Replication). The similar technology is disclosed, for example, in Non-patent document 2 as well.
The SBR aims at compensating the signal of a high-frequency band (high-frequency-band component) that is lost due to an audio encoding process such as the AAC or a band restriction process according hereto, whereby the signal of a frequency band (low-frequency-band component) of which the frequency is lower than that of the band that is compensated by the SBR has to be transmitted by employing another means. Information for generating a pseudo-component of a high-frequency band based upon the low-frequency-band component that is transmitted by employing another means is included in the information encoded by the SBR, and adding the pseudo-component of a high-frequency-band to the low-frequency-band component allows a deterioration of a sound quality due to the band restriction to be compensated.
Hereinafter, an operation of the SBR will be explained in details by making a reference to FIG. 6. FIG. 6 is a view illustrating one example of a band expansion encoding/decoding device employing the SBR. The encoding side is configured of an input signal division unit 100, a low-frequency-band component encoding unit 101, a high-frequency-band component encoding unit 102, and a bit stream multiplexing unit 103, and the decoding side is configured of a bit stream separation unit 200, a low-frequency-band component decoding unit 201, a sub-band division unit 202, a band expansion unit 203, and a sub-band synthesization unit 204.
In the encoding side, the input signal division unit 100 analyzes an input signal 1000, and outputs a high-frequency-band sub-band signal 1001 divided into a plurality of high-frequency bands, and a low-frequency-band signal 1002 including a low-frequency-band component. The low-frequency-band signal 1002 is encoded by the low-frequency-band component encoding unit 101 into low-frequency-band component information 1004 by employing the foregoing encoding technique such as the AAC, which is transmitted to the bit stream multiplexing unit 103. Further, the high-frequency-band component encoding unit 102 extracts high-frequency-band energy information 1102 and additional signal information 1103 from the high-frequency-band sub-band signal 1001, and transmits them to the bit stream multiplexing unit 103. The bit stream multiplexing unit 103 multiplexes high-frequency-band component information that is configured of the low-frequency-band component information 1004, the high-frequency-band energy information 1102, and the additional signal information 1103, and outputs it as a multiplexing bit stream 1005.
Herein, the high-frequency-band energy information 1102 and the additional signal information 1103 are calculated, for example, in a frame unit sub-band by sub-band. By taking characteristics in a time direction and a frequency direction of the input signal 1000 into consideration, both may be calculated in a time unit obtained by further subdividing the frame in terms of the time direction, and in a band unit obtained by collecting a plurality of the sub-bands in terms of the frequency direction. Calculating the high-frequency-band energy information 1102 and the additional signal information 1103 in a time unit obtained by further subdividing the time-direction frame makes it possible to more detailedly signify a change with a time in the high-frequency-band sub-band signal 1001. Calculating the high-frequency-band energy information 1102 and the additional signal information 1103 in a band unit obtained by collecting a plurality of the sub-bands makes it possible to reduce the total number of the bits necessary for encoding the high-frequency-band energy information 1102 and the additional signal information 1103. The division unit in the time direction and the frequency direction that is utilized for calculating the high-frequency-band energy information 1102 and the additional signal information 1103 is referred to as a time/frequency grid, and its information is included in the high-frequency-band energy information 1102 and the additional signal information 1103.
In such a configuration, the information that is included in the high-frequency-band energy information 1102 and the additional signal information 1103 is only high-frequency-band energy information and additional signal information. For this, it demands only a small information amount (total bit number) as compared with low-frequency-band component information including waveform information and spectrum information of a narrow-band signal. Thus, it is suitable for low-bit-rate encoding of a wide-band signal.
In the decoding side, the multiplexing bit stream 1005 is separated into low-frequency-band component information 1007, high-frequency-band energy information 1105, and additional signal information 1106 in the bit stream separation unit 200. The low-frequency-band component information 1007, which is, for example, information encoded by employing the encoding technique such as the AAC, is decoded in the low-frequency-band component decoding unit 201, and a low-frequency-band component decoding signal 1008 signifying the low-frequency-band component is generated. The low-frequency-band component decoding signal 1008 is divided into low-frequency-band sub-band signals 1009 in the sub-band division unit 202, which are input into the band expansion unit 203. The low-frequency-band sub-band signal 1009 is simultaneously supplied to the sub-band synthesization unit 204 as well. The band expansion unit 203 copies the low-frequency-band sub-band signal 1009 into a high-frequency band sub-band, thereby to reproduce the high-frequency-band component lost due to the band restriction.
Energy information of the high-frequency-band sub-band being reproduced is included in the high-frequency-band energy information 1105 being input into the band expansion unit 203. It is utilized as a high-frequency-band component after employing the high-frequency-band energy information 1105 to regulate energy of the low-frequency-band sub-band signal 1009. Further, the band expansion unit 203 generates an additional signal according to the additional signal information that is included in the additional signal information 1106. Herein, a sine-wave tone signal or a noise signal is employed as an additional signal being generated. The band expansion unit 203 adds the foregoing additional signal to the high-frequency-band component for which the energy regulation has been made, and supplies it as a high-frequency-band sub-band signal 1010 to the sub-band synthesization unit 204. The sub-band synthesization unit 204 band-synthesizes the low-frequency-band sub-band signal 1009 supplied from the sub-band division unit 202, and the high-frequency-band sub-band signal 1010 supplied from the band expansion unit 203, and generates an output signal 1011.
Herein, an operation of the energy regulation in the band expansion unit 203 will be explained in details. The band expansion unit 203 regulates a gain of the copied low-frequency-band sub-band signal 1009 and the additional signal, then adds it to the high-frequency-band component for which the energy regulation has been made, and generates the high-frequency-band sub-band signal 1010 so that energy of the high-frequency-band sub-band signal 1010 assumes an energy value (hereinafter, referred to as target energy) that the high-frequency-band energy information 1105 signifies. The gain of the copied low-frequency-band sub-band signal 1009 and the additional signal can be decided, for example, with the following procedure.
At first, it is assumed that one of the copied low-frequency-band sub-band signal 1009 and the additional signal is a main component of the high-frequency-band sub-band signal 1010, and the other is a subsidiary component. In a case where the low-frequency-band sub-band signal 1009 is a main component and the additional signal is a subsidiary component, the gain is decided by the following equation.Gmain=sqrt(R/E/(1+Q))Gsub=sqrt(R*Q/N(1+Q))Where Gmain and Gsub signify a gain for regulating an amplitude of the main component and a gain for regulating an amplitude of the subsidiary component, respectively, and E and N signify energy of the low-frequency-band sub-band signal 1009 and energy of the additional signal, respectively. In a case where the energy of the additional signal has been normalized to 1 (one), it is assumed that N=1. Further, R signifies target energy of the high-frequency-band sub-band signal 1010, Q signifies an energy ratio of the main component and the subsidiary component, and R and Q are included in the high-frequency-band energy information 1105 and the additional signal information 1106. Additionally, assume that sqrt (•) is an operator for obtaining a square root. On the other hand, in a case where the additional signal is a main component and the low-frequency-band sub-band signal 1009 is a subsidiary component, the gain is decided by the following equation.Gmain=sqrt(R/N/(1+Q))Gsub=sqrt(R*Q/E/(1+Q))The band expansion unit 203 employs the gain calculated in the above procedure to operate a weighting addition for the low-frequency-band sub-band signal 1009 and the additional signal, and calculates the high-frequency-band sub-band signal 1010.
Encoding the audio signal at a high quality in a low bit rate necessitates compressing the high-frequency-band component into a component of which information amount is small. Thus, it becomes important to extract the exact high-frequency-band energy information 1102 and additional signal information 1103 in the high-frequency-band component encoding unit 102. For example, in a case of encoding a signal in which a noise level of the high-frequency-band component is higher than that of the low-frequency-band component, as is the case of a signal of a stringed instrument, adding a noise signal of an appropriate magnitude to the signal obtained by copying the low-frequency-band sub-band signal 1009 into the high-frequency band makes it possible to enhance a quality. So as to add a noise signal of an appropriate magnitude in the decoding side, it is necessary in the encoding side to incorporate a precise energy ratio Q of the low-frequency-band sub-band signal 1009 and the noise signal being added into the additional signal information 1103 being generated. For this, the noise level of the high-frequency-band component in the input signal has to be precisely calculated in the high-frequency-band component encoding unit 102.
A first conventional example of the high-frequency-band component encoding unit 102 for calculating a noise level of the high-frequency-band component is disclosed in Non-patent document 3. The high-frequency-band component encoding unit shown in FIG. 7 is configured of a time/frequency grid generation unit 300, a spectrum envelope calculation unit 301, and a noise level calculation unit 302, and a noise level unification unit 303.
The time/frequency grid generation unit 300 employs the high-frequency-band sub-band signal 1001, groups a plurality of the sub-band signals in the time direction and the frequency direction, and generates time/frequency grid information 1100. The spectrum envelope calculation unit 301 extracts target energy R of the high-frequency-band sub-band signal in a time/frequency grid unit, and supplies it as high-frequency-band energy information 1102 to the bit stream multiplexing unit 103. The noise level calculation unit 302 outputs a ratio of the noise component that is included in the sub-band signal as a noise level 1101 in each sub-band unit. The noise level unification unit 303 employs an average of the foregoing noise levels in a plurality of the sub-bands, obtains additional signal information 1103 signifying the foregoing energy ratio Q in a time/frequency grid unit, and supplies it the bit stream multiplexing unit 103.
The method of employing a prediction residual is known as a method of calculating the noise level 1101 in the noise level calculation unit 302, and a noise level T(k) of a sub-band k can be calculated according to the following equation.
                              T          ⁡                      (            k            )                          =                                            ∑              l                        ⁢                                                  ⁢                                                                            Y                  ⁡                                      (                                          k                      ,                      l                                        )                                                                              2                                                                          ∑                l                            ⁢                                                          ⁢                                                                                      X                    ⁡                                          (                                              k                        ,                        l                                            )                                                                                        2                                      -                                          ∑                l                            ⁢                                                          ⁢                                                                                      Y                    ⁡                                          (                                              k                        ,                        l                                            )                                                                                        2                                                                        [                  Numerical          ⁢                                          ⁢          equation          ⁢                                          ⁢          1                ]            where (k, 1) and Y(k, 1) signify a sub-band signal of the sub-band k, and a prediction sub-band signal, respectively. The method of making a linear prediction by employing a covariance method or an autocorrelation method is known as a method of calculating the prediction sub-band signal. When a small amount of the noise component is included in the sub-band signal, a difference between a sub-band signal X and a prediction sub-band signal Y becomes small, and the value of the noise level T(k) becomes large. Contrarily, when a large amount of the noise component is included, a difference between a sub-band signal X and a prediction sub-band signal Y becomes large, and the value of the noise level T(k) becomes small. In such a manner, the noise level T(k) can be calculated based upon magnitude of the noise component that is included in the sub-band signal.
The noise level unification unit 303 calculates an energy ratio Q of the low-frequency-band sub-band signal and the noise signal in a unit of a plurality of the sub-bands based upon the time/frequency grid information 1100. The reason is that calculating an energy ratio Q in a unit of a plurality of the sub-bands rather than calculating an energy ratio Q in a unit of each sub-band enables the bit number necessary for the additional signal information 1103 to be curtailed all the more. For example, now think about the case of signifying N sub-bands of a sub-band k0 to a sub-band k0+N−1 with an identical energy ratio Q (fNoise). The additional signal information 1103 is calculated by averaging the noise levels 1101 of N sub-bands of a sub-band k0 to a sub-band k0+N−1. Q (fNoise) is expressed by the following equation.
                              Q          ⁡                      (            fNoise            )                          =                  c          ·                      N                                          ∑                                  p                  =                                      k                    0                                                                                        k                    0                                    +                  N                  -                  1                                            ⁢                                                          ⁢                                                T                  1                                ⁡                                  (                  k                  )                                                                                        [                  Numerical          ⁢                                          ⁢          equation          ⁢                                          ⁢          2                ]            where fNoise signifies a frequency number of the additional signal information 1103, and c is a constant.
As a second conventional example of the high-frequency-band component encoding unit 102 for calculating a noise level of the high-frequency-band component, there exists the method disclosed in Patent document 1. In the second conventional example, a difference between a maximum value and a minimum value of a spectrum envelope that is calculated by applying high-resolution FFT to the input signal, and a result of having smoothed the calculated difference by a time and a frequency is assumed to be a noise level.
Patent document 1: JP-P2002-536679A
Non-patent document 1: “Digital Radio Mondiale (DRM); System Specification”, ETSI, TS 101 980 V1.1.1, paragraph 5.2.6, September, 2001
Non-patent document 2: “AES (Audio Engineering Society) Convention Paper 5553”, 112th AES Convention, May 2002
Non-patent document 3: “Enhanced aacPlus general audio codec; Enhanced aacPlus encoder SBR part”, 3GPP, TS 26.404 V6.0.0, September, 2004