1. Field of the Invention
The present invention relates to an encoder that performs a high-pass encoding process in which an input signal is divided into frames formed of certain samples and calculates a plurality of parameters indicating characteristics of a high-frequency component in the input signal, thereby generating encoded data of high-frequency component.
2. Description of the Related Art
Conventionally, music files and video images having a large volume are transferred via a network such as the Internet due to popularization of mobile phones, personal computers, and the like.
An encoding technique for reducing the volume by compressing the music files and the like having a large volume has been used for quickly transmitting the music files and the like having the large volume, on a line with a slow transmission speed (a low bit rate). The encoding technique is also used when the music file and the like are accumulated and recorded on a digital versatile disk (DVD). In such encoding technique, various techniques for encoding the original music file into a smaller volume without degrading the sound quality of the original music file are disclosed.
Generally, as shown in FIG. 9, an encoder combining a spectral band replication (SBR) encoding method and a core encoding method is used for such encoding. Specifically, as shown in FIG. 10, a low-frequency component in an input signal obtained by down-sampling the input signal is encoded by the core encoding method, and a plurality of characteristic parameter information (for example, spectral power information, noise information, frequency position information of tone components, and the like) required for generating a high-frequency component in the input signal is encoded by the SBR encoding method, using the encoded information of the low-frequency component.
By the SBR encoding method, for example, the file volume after encoding can be greatly reduced than the original volume of the music file, and in the encoded file, not only being able to play the music file from the head but also it is able to play the music file from halfway (Japanese Patent Application Laid-open No. 2006-106475).
The core encoding method and the SBR encoding method are explained. For the core encoding method, a transform coding method, which performs coding in a region where an input signal is transformed into a frequency domain, is generally used, and a quantization error and the number of encoding bits in coding can be arbitrarily controlled. Here, the quantization error and the number of encoding bits are in a trade-off relation. That is, if a number of encoding bits is small, the quantization error increases so that the sound quality is degraded, and if the number of encoding bits is large, the quantization error decreases so that the sound quality is improved.
According to the SBR encoding method, the plurality of the characteristic parameter information for generating the high-frequency component in the input signal are obtained based on an input spectrum obtained by inputting the input signal to a filter bank, which are then encoded. In the SBR encoding method, as shown in FIG. 11, each parameter is obtained for each segment section (hereinafter, referred to as “time/frequency grid”) in which the input spectrum signal (with a fixed length) for one frame is divided in a time direction and a frequency direction.
In the SBR encoding method, the time/frequency grid width is adaptively changed according to the input signal, to improve encoding performance. For example, in a variable part where a change of the input signal is large (where a spectral change in the time direction is large), time resolution is increased (the time grid width is small (the number of divisions increases), and the frequency grid width is large (the number, of divisions decreases)). On the contrary, in a stationary part where the change of the input signal is small (where a spectral change in the time direction is small), frequency resolution is increased (the time grid width is large (the number of divisions decreases), and the frequency grid width is small (the number of divisions increases)).
As the grid width becomes smaller (as the number of divisions increases), the number of parameters obtained for each frame increases; therefore, the amount of information increases. As a result, the number of encoding bits increases. Further, the number of encoding bits of each parameter obtained for each grid changes according to the property of the input signal. That is, in the SBR encoding method, the number of encoding bits fluctuates according to the property of the input signal.
Therefore, in an encoder combining the SBR encoding method and the core encoding method, when it is assumed that an available number of encoding bits per one frame is “X,” the number of bits used in the core encoding method is “Y.” and the number of bits used in the SBR encoding method is “Z,” the number of bits is controlled so that a sum of “Y” and “Z” does not exceed “X.” That is, the sum of “Y” and “Z” satisfies the encoding condition, Y+Z≦X.
Specifically, the encoder first determines the number of bits “Z” used in the SBR encoding method so that the number of bits obtained by subtracting “Z” from the total number of bits “X” becomes “Y.” and the encoder controls the number of bits used in the core encoding method to be equal to or less than “Y.” That is, the encoder performs core encoding with the number of bits “Y.” which is a remaining number of bits after subtracting the bits “Z” for the SBR encoding from the available number of bits “X,” and controls the entire number of bits “X” by controlling the number of bits “Y.”
In the conventional technique described above, since the total number of encoding bits “X” is fixed, the number of core encoding bits “Y” indicating the number of bits of encoded data of low-frequency component is automatically determined when the number of SBR encoding bits “Z” indicating the number of bits of encoded data of high-frequency component is set. Accordingly, there is a problem in that if the value of “Z” increases locally, the value of “Y” considerably decreases.
To explain the above-described problem more in detail, in a one-segment broadcasting system or the like, the number of SBR encoding bits varies according to the property of the input signal when a stereo signal of 48-kHz sampling is encoded under an ultra low bit rate (high compression) condition of equal to or less than 40 kilobits per second (kbps), that is, under a condition in which the available number of bits is small for each frame. Therefore, the number of SBR encoding bits cannot be controlled to an arbitrary number of bits for each frame. While an average bit rate of SBR encoded bits is generally about 3 to 5 kbps, the bit rate can locally be 20 kbps or higher according to the property of the input signal.
Here, the number of encoding bits allocated to the core encoding becomes considerably small, namely, as small as 20 kbps or less. Therefore, the quantization error in the core encoding increases due to insufficient bits. That is, as shown in FIG. 13, a distortion of the low-frequency spectrum component increases relative to the input signal. Further, because the high-frequency spectrum component is generated by the SBR encoding based on the low-frequency spectrum component with a large distortion, the low-frequency distortion propagates to the high-frequency side. As a result, the spectral distortion of the whole frequency component increases, thereby causing large degradation of sound quality.