1. Field of the Invention
The present invention relates to an apparatus and method for deciding an adaptive noise level for bandwidth extension; and, more particularly, to an apparatus and method for deciding an adaptive noise level for bandwidth extension, which accurately measure a high-band noise level and improve the quality of a high-band signal by adaptively controlling the high-band noise level according to a pitch frequency of an input signal in deciding a high-band noise level to correct the tonality of high-band for bandwidth extension.
2. Description of Related Art
FIG. 1 is a diagram illustrating an audio encoding/decoding apparatus for bandwidth extension according to the prior art.
As shown in FIG. 1, the audio encoding apparatus 110 for bandwidth extension according to prior art includes a low-band signal extractor 111, a low-band encoder 112, and a high-band encoder 113. On the contrary, the audio decoding apparatus 120 for bandwidth extension according to the prior art includes a low-band decoder 121, a high-band decoder 122, and a mixer 123. The high-band decoder 122 includes a high-band generator 1221 and a high-band corrector 1222.
The audio decoding apparatus 110 for bandwidth extension according to the prior art compresses an input signal using a bandwidth extension technology.
To be specific, the low-band signal extractor 111 receives an input signal such as an audio signal and extracts a low-band signal using a low pass filter (LPF). For example, the low-band signal extractor 111 may be embodied using a low pass filter that passes a low-band signal such as 8 kHz. In this case, frequency bands lower than 8 kHz are defined as low-band, and frequency bands higher than 8 kHz are defined as high-band.
The low-band encoder 112 encodes the extracted low-band signal from the low-band signal extractor 111 based on a transform coding scheme. The transform coding scheme is a coding technology that codes data by transforming an input signal to a frequency domain and quantizing the transformed signal according to an acoustic significance of each frequency coefficient. For example, Moving Picture Experts Group MPEG-1 Layer 3 Audio Codec (MP3) and MPEG-2/4 Advanced Audio Coding (AAC) are representative Transform Coding technologies.
The audio encoding apparatus 110 does not employ a typical coding scheme that directly quantizes and transmits a high-band signal according to a transform coding scheme. Therefore, the high-band encoder 113 receives an input signal, which is an audio signal, extracts and quantizes correction parameters from the received input signal by comparing low-band frequency information and high-band frequency information thereof, and transmits the results to the audio decoding apparatus 120.
Meanwhile, the audio decoding apparatus 120 restores an original signal using the compressed signal and the correction parameters from the audio encoding apparatus 110.
To be specific, the low-band decoder 121 restores a low-band signal by decoding the transform-coded low-band signal from the audio encoding apparatus 110 according to a transform coding scheme.
The high-band decoder 122 copies necessary frequency information from the reconstructed low-band signal of the low-band decoder 121 and restores a high-band signal using the copied frequency information. The high-band decoder 122 receives the correction parameters matched with difference between the low-band frequency characteristic and the high-band frequency characteristic from the high-band encoder 113 and corrects the high-band signal based on the received correction parameters. The correction parameters are used to accurately restore the high-band signal.
In other words, the high-band generator 1221 copies necessary information from the reconstructed low-band signal of the low-band decoder 121 and generates the high-band signal using the copied information. The high-band corrector 1222 corrects the high-band signal from the high-band generator 1221 according to the correction parameters from the high-band encoder 113.
The mixer 123 mixes the reconstructed low-band signal from the low-band decoder 121 and the reconstructed high-band signal from the high-band decoder 122 and outputs the mixed signal as an output signal.
The representative correction parameters used for encoding and decoding audio signal for bandwidth extension are energy and tonality. Since the low-band signal and the high-band signal are not matched in frequency energy, the audio encoding apparatus 110 divides the high-band signal into a plurality of sub-band signals, quantizes each of energies of the sub-band signals, and transmits the quantized results. The audio decoding apparatus 120 controls the energy of each sub-band signal of the high-band signal based on the energy of each sub-band signal from the audio encoding apparatus 110.
In general, the high-band signal has weaker tonality than the low-band signal. Therefore, it is necessary for the audio decoder 120 to weaken the tonality by reducing tonal characteristics before generating the high-band signal using the low-band signal. In order to weaken the tonality, the audio decoding apparatus 120 properly adds a noise to each high-band frequency band. The audio encoding apparatus 110 calculates and quantizes a noise level of each band to correct the tonality of the high-band signal and transmits the results to the audio decoding apparatus 120.
As a well-known audio encoding/decoding technology using bandwidth extension, a spectral band replication (SBR) technology was introduced. The SBR technology is generally used in a MPEG-4 HE-AAC encoding apparatus.
According to the SBR technology, the high-band encoder 113 transforms an input signal to a plurality of time-frequency domain signals using a quadrature mirror filter (QMF). The high-band encoder 113 measures tonality of each frequency channel based on prediction capability of a time domain signal corresponding to each frequency channel and decides a noise level to add based on the measured tonality. The high-band encoder 113 uses a 2nd-order linear predictor for predicting a time domain signal.
According to the SBR technology of the prior art, a noise level is decided based on the fixed method without considering characteristics of an input signal. Therefore, it is impossible to accurately measure tonality in the SBR technology according to the prior art because the characteristics of an input signal are not considered.
Particularly, the SBR technology according to the prior art incorrectly decides a noise level to be higher than a necessary noise level to add when a pitch frequency of an input signal is low. If a high-band signal is reconstructed using the incorrect noise level which is higher than the necessary noise level to add, high-band performance is deteriorated due to excessively added noise. As a result, the quality of the reconstructed signal is deteriorated. Therefore, there is a demand to develop a method of accurately deciding a noise level in consideration of the characteristics of an input signal for improving the performance of the SBR technology and the overall performance of the audio encoding apparatus 120.
FIG. 2 is a diagram illustrating a noise level deciding apparatus employing a SBR technology according to the prior art.
As shown in FIG. 2, the noise level deciding apparatus 200 according to the prior art includes a signal converter 210, a linear predictor 220, and a noise level decider 230.
The signal converter 210 converts an input signal to a 32×64 time-frequency domain. For example, if a sampling frequency of an input signal is 48 kHz, a bandwidth of each frequency channel is 375 Hz.
The linear predictor performs 2nd-order linear prediction on the converted input signal from the signal converter 210 independently for 64 frequency channels and measures tonality of each channel based on the result of the 2nd-order linear prediction.
The noise level decider 230 compares the measured tonalities of low-band channels and high-band channels and decides a noise level to add to high-band channel based on the comparison result. That is, the noise level is added to make a reconstructed high-band signal to have original tonality. Here, the noise level decider 230 may combine a plurality of frequency channels into a block by reducing a resolution of a frequency channel and may allocate a noise level to each block in order to reduce the number of parameters to transmit.
Eq. 1 is a method of deciding a noise level.
                              noise          ⁢                                          ⁢          level                =                  G          ⁢                      1                          T              ⁡                              [                p                ]                                              ⁢                                          ⁢          or          ⁢                                          ⁢          K          ⁢                                    T              ⁡                              [                q                ]                                                                    T                2                            ⁡                              [                p                ]                                                                        Eq        .                                  ⁢        1            
In Eq. 1, T[p] denotes a tonality value for a frequency channel p. G and K are constants.
The high-band generator 1221 copies a frequency from a qth low-band channel and generates a frequency of a pth high-band channel. Here, a noise level to add to the pth channel is calculated using Eq. 1.
However, since a pitch frequency of an input signal is not considered, a measured tonality value of the noise level deciding apparatus 200 may be not identical to a real tonality value. Therefore, the noise level deciding apparatus 200 of FIG. 2 may incorrectly decide a noise level. For example, if the measured tonality value of the noise level deciding apparatus 200 is two times smaller than a real tonality value, the noise level deciding apparatus 200 decides a noise level to be two times greater than an ideal noise level for a real tonality value using Eq. 1. In this case, the performance of the audio encoding apparatus 110 is significantly deteriorated because too much high-band noise is added.
FIG. 3 is a graph showing spectrum representing harmonic components of an audio signal.
An audio signal having strong tonality includes a fundamental frequency and corresponding strong harmonic frequency components of the fundamental frequency. As shown in FIG. 3, the harmonic frequency components of the audio signal has a harmonic peak 301 at every peak interval 302 in a frequency domain. If a pitch frequency is small, the peak interval 302 is shortened.
In the SBR technology, a plurality of harmonic frequency components may be distributed in one frequency channel. Also, a frequency channel may include frequency components that are not related thereto due to the aliasing effect of QMF for time-frequency analysis.
If a frequency channel includes a plurality of harmonic frequency components as described above, it may deteriorate the prediction performance for measuring the tonality of each channel through 2nd-order linear prediction. The linear predictor 220 may incorrectly predict that an input signal has tonality lower than normal tonality. Therefore, a measured tonality value of input signal, which has strong tonality, may be smaller than a real tonality value according to relation of an input pitch frequency and a frequency channel configuration. In this case, the noise level decider 230 decides a noise level greater than a normal noise level and the SBR performance and the overall performance of the audio encoding apparatus 110 are deteriorated.
FIGS. 4A to 4D are graphs for describing harmonic components of a frequency channel which vary according to a pitch frequency and the corresponding tonality difference due to those harmonic components.
As shown in FIG. 4A, a symbol ▪ denotes harmonic frequency locations 421 to 426 for a pitch frequency 105 Hz and a symbol ● denotes harmonic frequency locations 411 to 415 for a pitch frequency 225 Hz. The harmonic frequency locations for pitch frequencies 105 Hz and 225 Hz in second and third frequency channels 41 and 42 will be described. Here, a sampling frequency is 48 kHz and a bandwidth of a channel is 375 Hz.
FIGS. 4B to 4D show spectrum of the second and third frequency channels for each pitch frequency.
FIG. 4B shows spectrum and each harmonic component of the second channel 41 for a signal having a pitch frequency of 225 Hz. The second channel corresponds to frequencies from 375 Hz to 750 Hz. For a signal having a pitch frequency of 225 Hz, a harmonic component 411 having 450 Hz, which is located inside the second channel 41, corresponds to a second peak 431 in spectrum, and the harmonic component 412 having 675 Hz, which is also located inside the second channel 41, corresponds to a first peak 432. As shown, the order of harmonic location is reversed due to frequency folding caused by 64 times down-sampling. Although harmonic components 414 and 413 are not located inside the second channel 41, corresponding small two peaks 434 and 433 of 900 Hz and 225 Hz are included inside the second channel signal due to aliasing. A corresponding second channel signal with a pitch frequency of 225 Hz can be accurately predicted using a 2nd-order predictor, and the tonality is measured as about 40.
FIG. 4C shows spectrum and harmonic components of the second channel 41 (375 Hz to 750 Hz) for a signal having a pitch frequency of 105 Hz. For a signal having a pitch frequency of 105 Hz, harmonic components 421, 422, 423, and 424 having 420 Hz, 525 Hz, 630 Hz, and 735 Hz, which are all located inside the second channel 41, correspond to a fourth peak 441, a third peak 442, a second peak 443, and a first peak 444, respectively, in spectrum. A corresponding second channel includes two aliasing harmonic peaks 445 and 446 of 315 Hz and 840 Hz in addition to four normal harmonic peaks 441 to 444. Therefore, the 2nd-order predictor cannot normally perform prediction for a corresponding second channel signal with a pitch frequency of 105 Hz and incorrectly measures tonality about 1.0.
FIG. 4D shows spectrum and harmonic components of the third channel 42 (750 Hz to 1125 Hz) for a signal having a pitch frequency of 225 Hz. For a signal having a pitch frequency of 225 Hz, the harmonic components 414 and 415 having 900 Hz and 1125 Hz, which are located inside the third channel 42, correspond to a second peak 451 and a first peak 452 respectively, in spectrum. A corresponding third channel includes two normal harmonic peaks 451 and 452 and one aliasing peak 453. In comparison with FIG. 4B, a size of the aliasing harmonic peak 453 of 675 Hz is large although the number of harmonic peaks decreases. Thus, tonality is measured about 1.0 smaller than real tonality. That is, since a harmonic component of 675 Hz outside the corresponding third channel is close to a channel boundary, the aliasing effect is occurred greatly in the third channel.
As shown in FIGS. 4A to 4D, tonality is inconstantly measured according to a pitch frequency and a relative location between harmonic components and channel boundary. It is impossible to prevent the aliasing. However, when a pitch frequency is large, the aliasing is occurred only in some channels according to a relative location between a harmonic frequency and a channel boundary.
In order to overcome an aliasing problem, the noise level deciding apparatus 200 of FIG. 2 may calculate an averaged tonality for a plurality of channels by reducing a channel resolution. In this case, it is possible to slightly correct incorrectly measured tonality that is smaller than real tonality, especially for a large pitch frequency.
However, if a pitch frequency is small, the number of normal harmonic components included in one channel increases. It commonly happens in all channels. Therefore, the effect of the increased number of normal harmonic components in one channel is not reduced by reducing a channel resolution. Also, a noise level further increases when a pitch frequency is small because channel aliasing occurs more if the pitch frequency is small.
As described above, the accuracy of deciding a noise level varies according to a pitch frequency. Therefore, it is necessary to correct a noise level of each channel by calculating a boundary of each frequency channel according to a sampling frequency and analyzing correlation between the location of harmonic component and the calculated boundary.