Conventional systems for encoding time series signals, such as speech signals and acoustic signals, with a small number of bits include an encoding system that obtains the pitch periods of the targets to be encoded and performs encoding (see Non-patent literature 1, for example). A code-excited linear prediction (CELP) system, which is used for mobile phones and the like, will be described as an example of the conventional encoding system in which the pitch periods are obtained and encoding is performed.
FIG. 1 shows a block diagram illustrating an example of the conventional CELP system.
An encoder 91 receives time series signals x(n) (n=0, . . . , L−1; L is an integer equal to 2 or larger), such as speech signals and acoustic signals, divided in units of frames, which are predetermined time intervals. A linear prediction analysis unit 911 performs linear prediction analysis of the time series signals x(n) (n=0, . . . , L−1) at respective points in time n=0, . . . , L−1 included in the current frame to generate linear prediction information LPC info for identifying an all-pole synthesis filter 915 used for the current frame. For example, the linear prediction analysis unit 911 calculates linear prediction coefficients α(m) (m=1, . . . , P; P is a linear prediction order, which is a positive integer) for the time series signals x(n) (n=0, L−1) in the current frame, converts the linear prediction coefficients α(m) (m=1, . . . , P) to line spectrum pair coefficients LSP, and outputs the quantized values of the line spectrum pair coefficients LSP as the linear prediction information LPC info.
A fixed codebook 914 outputs signal components c(n) (n=0, . . . , L−1) formed of one or more signals each having a value formed of a non-zero individual pulse and its positive or negative sign and one or more signals each having a value of zero, under the control of a search unit 913. An adaptive codebook 912 stores excitation signals generated at past points in time, and the adaptive codebook 912 outputs adaptive signal components v(n) (n=0, . . . , L−1) obtained by using excitation signals delayed in accordance with pitch periods T obtained by the search unit 913. The excitation signals of the current frame corresponding to the signal components c(n) (n=0, . . . ,L−1) from the fixed codebook 914 and the adaptive signal components v(n) (n=0, . . . , L−1) from the adaptive codebook 912 can be expressed as follows:u(n)=gp·v(n)+gc·c(n)(n=0, . . . , L−1)  (1)Here, gp is a pitch gain given to the adaptive signal components v(n), and gc is a fixed-codebook gain given to the signal components c(n).
The search unit 913 searches for pitch periods T, signal components c(n) (n=0, . . . , L−1), pitch gains gp, and fixed-codebook gains gc so as to minimize values obtained by applying a perceptual weighting filter 916 to the differences between the input time series signals x(n) (n=0, . . . , L−1; n will be referred to as a sample point) and synthesis signals x′(n) (n=0, . . . , L−1) obtained by applying the all-pole synthesis filter 915 identified with the linear prediction information LPC info to the excitation signals u(n) (n=0, . . . , L−1). The search unit 913 outputs excitation parameters that include the pitch periods T, code indexes Cf identifying the signal components c(n) (n=0, . . . , L−1), the pitch gains gp, and the fixed-codebook gains gc.
Here, the linear prediction information LPC info is updated in each frame, and the pitch periods T, the code indexes Cf, the pitch gains gp, and the fixed-codebook gains gc are updated in each subframe included in the frame. If each frame has a single subframe, the amount of information, such as the excitation parameters, is small, but the temporal changes of the time series signals x(n) (n=0, . . . , L−1) cannot be followed, causing large coding distortion. The opposite effect is produced if each frame has a large number of subframes. Too many subframes cause the improvement in quality to become saturated, and increase the amount of information only. In an example described below, a single frame is divided into four equal subframes. Code indexes Cf obtained in first, second, third, and fourth subframes counted from the top of the frame (referred to as the first, second, third, and fourth subframes) are expressed as Cf1, Cf2, Cf3, and Cf4. Pitch gains gp and fixed-codebook gains gc obtained in the first, second, third, and fourth subframes are expressed respectively as gp1, gp2, gp3, and gp4 and gc1, gc2, gc3, and gc4, and the pitch gains and fixed-codebook gains are collectively called excitation gains. The pitch periods T obtained in the first, second, third, and fourth subframes are expressed as T1, T2, T3, and T4. The pitch period T is expressed simply by an integral multiple of the interval between sample points n (integer resolution) or by a combination of an integral multiple of the interval between sample points n and a fractional value (fractional resolution). With a fractional resolution in which a fractional value is expressed with two bits, for example, there are four expressions of pitch periods T: Tint−¼, Tint, Tint+¼, Tint+½ (Tint is an integer). When the adaptive signal components v(n) are expressed by using pitch periods T at fractional resolution, an interpolation filter for performing weighted averaging of a plurality of excitation signals delayed in accordance with the pitch periods T is used.
The excitation parameters that include the pitch periods T, the code indexes Cf, the pitch gains gp, and the fixed-codebook gains gc are input to a parameter encoding unit 917, and the parameter encoding unit 917 generates a bit stream BS formed of codes corresponding to the parameters and outputs it. The pitch gains gp and the fixed-codebook gains gc may be encoded by vector quantization which selects optimum codes for pairs of the pitch gains and the fixed-codebook gains.
FIG. 2A is a view showing an example structure of a bit stream BS when pitch periods T at fractional resolution are used, and FIG. 2B is a view illustrating codes corresponding to the pitch periods T at fractional resolution. FIG. 3 is a view illustrating resolutions for expressing a pitch period T (period resolutions).
When pitch periods T at fractional resolution are used, as shown in FIGS. 2A and 2B, codes corresponding to the integer parts and the fractional parts of the pitch periods T=T1, T2, T3, T4 are generated. In the example shown in FIGS. 2A and 2B, nine bits are assigned to the pitch periods in the first and third subframes, and the values of the pitch periods T1 and T3 in the first and third subframes (differences from the smallest value of the pitch periods) are encoded separately by an encoding system independent of the pitch periods of the other subframes (pitch period parts). Independent encoding of the pitch period of a given subframe by an encoding system independent of the pitch periods of the other subframes is referred to as independent encoding in each subframe. Generally, it is preferable to express a shorter pitch period T at fractional resolution. In the example shown in FIG. 3, when the integer part of the pitch period T is equal to or larger than the minimum value Tmin and smaller than TA, the pitch period T is expressed at fractional resolution in which the fractional value is expressed with two bits (quadruple fractional resolution); when the integer part of the pitch period T is from TA to TB, the pitch period T is expressed at fractional resolution in which the fractional value is expressed with one bit (double fractional resolution); and, when the integer part of the pitch period T is from TB to the maximum value Tmax, the pitch period T is expressed just as an integral multiple of the interval between sample points n (integer resolution).
In the second and fourth subframes (FIGS. 2A and 2B), the differences between the integer parts of the pitch periods T2 and T4 in the second and fourth subframes and the integer parts of the pitch periods T1 and T3 in the first and third subframes are separately encoded with four bits (difference integer parts), and the values after the decimal point (fractional parts) of the pitch periods T2 and T4 are encoded separately with two bits (quadruple fractional resolution) irrespective of the values of the difference integer parts. The pitch periods T2 and T4 have been searched in the range in which the differences between their integer parts and the integer parts of the pitch periods T1 and T3 respectively can be encoded with four bits. In other words, the pitch periods T2 and T4 have been searched in a range such that the values of the corresponding integer parts range from the values of the integer parts of the pitch periods T1 and T3 minus 8 to the values of the integer parts of the pitch periods T1 and T3 plus 7, respectively.
The bit stream BS output from the parameter encoding unit 917 of the encoder 91 (FIG. 1) is input to a parameter decoding unit 927 of a decoder 92. The parameter decoding unit 927 decodes the bit stream BS and outputs the code indexes Cf=Cf1, Cf2, Cf3, Cf4, pitch gains gp′=gp1′, gp2′, gp3′, gp4′, fixed-codebook gains gc′=gc1′, gc2′, gc3′, gc4′, pitch periods T′=T1′, T2′, T3′, T4′, and the linear prediction information LPC info, obtained by decoding.
A fixed codebook 924 outputs signal components c′(n) (n=0, . . . , L−1) identified by the code indexes Cf, and an adaptive codebook 922 outputs adaptive signal components v′(n) (n=0, . . . , L−1) identified by the pitch periods T′. Then, excitation signals u′(n) (n=0, . . . , L−1), which are the sums of the products obtained by multiplying the signal components c′(n) (n=0, . . . , L−1) by the fixed-codebook gains gc′ and the products obtained by multiplying the adaptive signal components v′(n) (n=0, . . . , L−1) by the pitch gains gp′, are added to the adaptive codebook 922. An all-pole synthesis filter 925 identified with the linear prediction information LPC info is applied to the excitation signals u′(n) (n=0, . . . , L−1), and synthesis signals x′(n) (n=0, . . . , L−1) generated as a result are output.