The objective of audio coding is to compress and transmit a digitized audio signal as effectively as possible, and to apply decoding processing to the compressed signal at a decoder, so that it is possible to reproduce as a high quality audio signal as possible. FIG. 1 is diagrams showing structures of a conventional encoder 200 and a conventional decoder 210 for applying an audio signal with typical compression encoding processing and typical decoding processing. As one example of the above, FIG. 1 shows the most typical compressing method applied to an audio signal. The conventional encoder 200 includes a frame segmentation unit 201, a spectrum transformation unit 202 and a spectrum encoding unit 203. The frame segmentation unit 201 divides an input audio signal in time domain into frames each of which has a predetermined number of consecutive samples. The spectrum transformation unit 202 transforms the input audio signal samples in each frame into a spectrum signal in frequency domain. The spectrum encoding unit 203 quantizes the spectrum signal up to a certain frequency generally known as the bandwidth and outputs the results as encoded data (bitstream). The outputted bitstream is transmitted to the decoder 210 via, for example, a transmission channel or a recording medium. On the other hand, the decoder 210, which receives the encoded data as an input bitstream from the encoder 200, includes a spectrum decoding unit 204, a spectrum inverse transformation unit 205, and a frame assembling unit 206. The spectrum decoding unit 204 obtains a spectrum signal by de-quantizing the encoded data of the input bitstream. The obtained spectrum signal is inverse-transformed by the spectrum inverse transformation unit 205 back into a time signal. Thereby the audio signal is generated on a frame to frame basis. The audio signals in respective frames are then assembled by the frame assembling unit 206 to form an output audio signal.
FIG. 2 is a graph showing one example of an audio signal whose high-frequency signal is lost due to the conventional low-bitrate coding. Here, as the bitrate that is an encoded amount per a unit time available to indicate the audio signal decreases, more sacrifice has to be made to a bandwidth 301 of an audio signal to be encoded. Here, a high-frequency component (high-frequency signal) is not as perceptually important as a low-frequency component (low-frequency signal), so that a bandwidth to be encoded is reduced firstly from the high-frequency component. As a result, for the low-bitrate coding, as shown in FIG. 2, a high-frequency tone signal 303 and a high-frequency component 304 which exists as harmonics of the low-frequency component are lost. In general, a range 302 to be decoded at the conventional decoder is equal to the bandwidth 301 of the signal to be encoded, so that perceptual audio quality is reduced. Bandwidth extension is a technology for recovering the high-frequency component which has been lost due to the above reason, and one typical example of such a technique is the Spectral Band Replication (SBR) method which is established as a standard method, ISO/IEC14496-3 MPEG-4Audio. The technology is described also in a patent reference 1.
As one example of the conventional technology of the present invention, the SBR method is used. FIG. 3 is a block diagram showing a structure of a decoder 400 which decodes an encoded bitstream by the SBR method. The decoder 400 is a decoder having a function of extending a bandwidth using the SBR method. The decoder 400 includes a bitstream de-multiplex unit 401, a core audio decoding unit 402, an analysis subband filter unit 403, a bandwidth extension unit 404, and a synthetic subband filter unit 405. Firstly, at the bitstream de-multiplex unit 401, an input bitstream is separated to become a core audio part of bitstream and a bandwidth extended part of bitstream. The core audio part of bitstream has been generated by encoding an low-frequency audio spectrum signal, and the bandwidth extended part of bitstream has been generated by encoding bandwidth extension information for generating a high-frequency signal by using the low-frequency signal coded in the core audio part. The core audio decoding unit 402 decodes the core audio part of bitstream to generate a time signal of the low-frequency component. The core audio decoding unit 402 may be any existing decoding unit, but in a case of the MPEG-4Audio standard, an AAC method that is also the MPEG-4 standard is used, for example. The decoded low-frequency component signal is then band-split into M-channel subband signals at the analysis subband filter unit 403. Subsequent bandwidth extension processing is performed for these subband signals (low-frequency subband signals). The bandwidth extension unit 404 processes the low-frequency subband signals using the bandwidth extension information in the bandwidth extended part, and generates new high-frequency subband signals which indicate high-frequency component signals. The generated high-frequency subband signals are inputted as N-channel subband signals together with the low-frequency subband signals into the synthetic subband filter unit 405, and are applied with assembling processing to form an output audio signal. In FIG. 3, the output audio signals from synthetic filters M to N−1 are shown as bandwidth extended signals. It is assumed that the subband signals used herein are indicated by segmenting an audio signal as a time signal into subbands in the frequency direction and by two-dimensionally arranging time samples included in each subband.
FIG. 4 is a diagram showing processing by which the bandwidth extension unit 404 shown in FIG. 3 processes the low-frequency subband signals to generate the high-frequency subband signals. The replicated high-frequency subband signal 501 is generated by replicating the low-frequency subband signal 502 at the high frequency. During the replication processing, the inverse filtering 503 restrains tonal characteristics of the low-frequency subband signal. A degree of the tonal restraint is controlled using a value called a chirp factor 504 (equivalent to an “adjustment coefficient” in the Claims of the present invention). A plurality of consecutive subbands are grouped and an identical chirp factor is applied to the groups, and the groups are hereinafter referred to as chirp factor bands. Here, a typical D-dimensional inverse filter is calculated according to the following equation:
                                                        X              high                        ⁡                          (                              t                ,                k                            )                                =                                                    X                low                            ⁡                              (                                  t                  ,                                      p                    ⁡                                          (                      k                      )                                                                      )                                      +                                          ∑                                  i                  =                  0                                                  i                  =                                      D                    -                    1                                                              ⁢                                                B                  j                  i                                ⁢                                  α                  i                                ⁢                                                      X                    low                                    ⁡                                      (                                                                  t                        -                        i                                            ,                                              p                        ⁡                                                  (                          k                          )                                                                                      )                                                                                      ,                            [                  Equation          ⁢                                          ⁢          1                ]            
where Xhigh(t,k) is a generated high-frequency subband signal, Xlow(t,k) is a low-frequency subband signal, t is a time sample position, k is a subband number, ai is a linear predictor coefficient calculated by linear prediction using Xlow(t,k), p(k) is a mapping function for determining a low-frequency subband signal corresponding to the k-th high-frequency subband signal, and Bj is a chirp factor corresponding to a chirp factor band bj set for the high-frequency subband signal Xhigh(t,k).
Technical details of the inverse filtering and a method of determining the mapping function p(k) are not included in the disclosure of the present invention, so that explanation for the technical details and the method are not described herein. Note that the chirp factor Bj is a value that is equal to or more than zero and equal to or less than 1, and effects of the tonal restraint become maximum when Bj=1 and minimum when Bj=0. Information of grouping the chirp factor bands and chirp factors for respective chirp factor bands are encoded, included in a bitstream, and then transmitted.
Subsequently, for the generated high-frequency subband signal, an envelope shape (roughly indicated signal energy distribution) is adjusted so that the generated high-frequency subband signal can have frequency characteristics similar to frequency characteristics of a high-frequency subband signal of original sound. One example of such a method of adjusting the envelope shape is a patent reference 2. A high-frequency subband signal indicated as two-dimensional time/frequency representation is divided first in the time direction into “time segments” and then in the frequency direction into “frequency bands”. FIG. 5 shows this processing for dividing a high-frequency subband signal. FIG. 5 is a graph showing one example of the segmentation method of dividing a high-frequency subband signal into time segments and frequency bands. Arrows 601 depict segmentation of the high-frequency subband signal in the time direction, and arrows 602 depict in the frequency direction. Each area of the high-frequency subband called an “energy band” which is divided in the time and frequency directions is scaled in order to correspond an energy value given for the area. The information of segmentation in the time/frequency directions used for the envelope shape adjustment, and the energy value for each divided area are encoded at the encoder 200, included in a bitstream, and then transmitted.
Furthermore, in addition to the envelope shape adjustment of the energy, a tone-to-noise ratio of the generated high-frequency subband signal is also an important factor for increasing expression of the generated signal and thereby realizing audio quality with higher fidelity to the input signal. When a noise component is lacking partially in the generated high-frequency subband signal, an artificial noise component is added in order to compensate the noise component lack. In the same manner, when a tonal component is lacking partially, an artificial tone component (sinewave) is added. The noise component is added at an area called a “noise band”, and the sine signal is added at an area called a “tone band”. FIG. 6(a) to (c) are graphs showing one example of segmentation of the high-frequency subband signal by grouping the divided high-frequency area as shown in FIG. 5 as an energy-band group, a noise-band group, and a tone-band group, respectively. The relationship among the energy bands, the noise bands, and the tone bands is shown in FIG. 6(a) to (c). The time-frequency space segmentation in FIG. 6(a) shows areas each of which is given with the same energy value for the envelope shape adjustment of the high-frequency subband signal. In FIG. 6(a), in a time-frequency space segmentation method 701, areas indicated as ei (i=0, 1, . . . , 23) are energy bands. In FIG. 6(b), in a time-frequency space segmentation method 702, areas indicated as qi (i=0, 1, . . . , 23) are noise bands. Note that the noise band segmentation and the chirp factor segmentation are identical. Furthermore, in FIG. 6(c), for a time-frequency space segmentation method 703, areas indicated as hi (h=0, 1, . . . , 23) are tone bands. The artificial sinewave is added to a subband that exists in a center of the high-frequency subband signal included in a tone band h16, as shown in the subband 704 added with a sinewave tone signal in FIG. 6(c). The information of the noise band segmentation and the tone band segmentation, an amount of noise added to each noise band, and information regarding necessity of tone signal addition at each tone band are encoded at the encoder, included in a bitstream, and then transmitted.
The following describes a method of calculating signal energy in each energy band, noise band (chirp factor band), and tone band. In the following description, B(t,k), E(t,k), Q(t,k), and H(t,k) refer to a chirp factor, an energy value, a ratio of noise component in a signal, a flag indicating necessity of tone signal addition, respectively, regarding a signal indicated by a time sample t and a frequency band k in the time/frequency representation of the high-frequency subband signal. As a rule of the notation, a signal point (sample) indicated by all (t,k) included in a certain energy band ei is E(t,k)=Ei, for example. For the chirp factor band bi, the noise band qi, and the tone band hi, the same mapping is performed for B(t,k), Q(t,k), and H(t,k), respectively. FIG. 7 is a table showing, regarding an identical energy band, an energy ratio of a high-frequency subband signal generated by replicating a low-frequency subband signal to an artificially added noise or tone component. Each energy value of the high-frequency subband signal generated by replicating the low-frequency subband signal, the artificially added noise component, and the artificially added tone component are calculated as shown in FIG. 7.
An important point of the energy value calculation is that a sum of three energy values of the high-frequency subband signal generated by replicating the low-frequency subband signal, the artificially added noise component, and the artificially added tone component is always equal to E(t,k). Therefore, a ratio Q(t,k) of the noise component is used to divide all signal energy E(t,k) into the replicated high-frequency subband signal and the artificially added noise or tone component.
A parameter necessary for the bandwidth extension processing as described above needs to be appropriately set at the encoder in order to generate a bitstream having high audio quality and proper syntax. Especially, in order to properly calculate the energy value of the high-frequency subband signal, the chirp factor, the existence of a tone signal, and the ratio of noise component, a technique is necessary to analyze an input signal indicated by the time/frequency representation. Without proper calculation of those information, for example, reproduced sound becomes noisy since the ratio of noise component becomes too high, and due to improper tone component addition or inverse filtering, the sound becomes unclear and, at worst, becomes distorted. Among those information, an example of a method of calculating the chirp factor is disclosed in a patent reference 3. According to the method, a tone-to-noise ratio of a high-frequency signal of an input signal is compared with a tone-to-noise ratio of a signal generated by replicating a low-frequency signal at high frequency, and the ratios are calculated using a simple mathematical formula, so that the chirp factor can be calculated. Moreover, an example of a method of calculating the ratio of noise component is described in a patent reference 4. According to the method, an input signal that is a time signal is divided into time frames, and then transformed into spectrum coefficients by using Fourier transformation. Indicators called a “peak follower” and a “dip follower” which represent a peak and a fall, respectively, of the spectrum coefficients are set for the calculated spectrum coefficients, and the ratio of noise component is determined from a spectrum energy value of a noise component derived from the two indicators.    Patent Reference 1: International Publication No. WO98/57436    Patent Reference 2: International Publication No. WO01/26095    Patent Reference 3: U.S. Publication No. US2002/0087304    Patent Reference 4: International Publication No. WO00/45379