A coding device is designed to code an audio signal efficiently. In human speech, the fundamental frequency (pitch) of an audio signal changes sometimes. This causes the energy of the audio signal to propagate through wider frequency bands. It is not efficient to code a pitch-changing audio signal by an acoustic signal coding device, especially in a low bit-rate.
Therefore, conventionally, the time warping technology is used to compensate the effect of pitch change (See Patent Literature (PTL) 1and Non Patent Literature (NPL) 1, for example).
More specifically, the time warping technology is used to achieve pitch correction (pitch shifting). FIGS. 1A and 1B illustrate an example of the conventional scheme of pitch shifting. Specifically, FIG. 1A shows a spectrum of an audio signal before pitch shifting, and FIG. 1B shows a spectrum of the audio signal after pitch shifting.
As shown in the drawings, the pitches are shifted from 200 Hz in FIG. 1A to 100 Hz in FIG. 1B. In this manner, by shifting the pitches of the next frame to align with the pitches of a previous frame, the pitches are made consistent. In this case, the energy of the audio signal converges as shown in FIGS. 2A to 2C.
FIG. 2A shows a sweep signal before pitch shifting in the conventional pitch shifting of audio signals. FIG. 2B shows a sweep signal after pitch shifting in the conventional pitch shifting of audio signals. As shown in the drawings, the pitches of the audio signal become constant by pitch shifting.
Furthermore, FIG. 2C shows the spectrum before and after pitch shifting in the conventional pitch shifting of audio signals. Here, the graph a in FIG. 2C shows the spectrum before pitch shifting and the graph b in FIG. 2C shows the spectrum after pitch shifting. As shown in FIG. 2C, the energy after pitch shifting is confined to a narrow bandwidth.
Here, pitch shifting is achieved using the re-sampling scheme, for example. In order to maintain a consistent pitch, a ratio of re-sampling (hereinafter referred to as a re-sampling rate) varies according to a pitch change ratio. By applying a pitch tracking algorithm to coding of a frame, a pitch contour of this frame can be obtained.
More specifically, the frame is segmented into small sections for pitch tracking. The adjacent sections may be overlapped. As the pitch tracking algorithm, for example, there are a pitch tracking algorithm based on auto-correlation (see NPL 2, for example), and a pitch detection scheme based on a frequency domain (see NPL 3, for example).
Each section has a corresponding pitch value. FIGS. 3 and 4 illustrate a conventional calculation scheme of pitch contours of audio signals. FIG. 3 shows that the pitches change depending on time. Furthermore, as shown in FIG. 4, one pitch value is calculated from one section of the audio signal. The pitch contour is the concatenation of the pitch values.
In pitch shifting, the re-sampling rate is in proportion to the pitch change ratio. Furthermore, information indicating the pitch change ratio is extracted from the pitch contour. Cent and half tone are often used to measure this pitch change ratio. FIG. 5 shows a measurement of the cent and half tone. The cent (c in FIG. 5) is calculated from a pitch ratio (pitch change ratio) of adjacent pitches as shown below.
                    cent        =                  1200          ×                      log            2                    ⁢                                          ⁢                                    pitch              ⁡                              (                                  i                  +                  1                                )                                                    pitch              ⁡                              (                i                )                                                                        [                  Math          ⁢                                          ⁢          1                ]            
According to the pitch change ratio, re-sampling is applied to the audio signal. Pitches of other sections are shifted to a reference pitch in order to obtain a consistent pitch. For example, if a pitch of the next section is higher than a pitch of the previous section, the re-sampling rate is set to a lower rate in proportion to the cent difference between the two pitches. Furthermore, if the pitch of the next section is lower than the pitch of the previous section, the re-sampling rate is set to a higher rate.
Taking into consideration a recording player capable of adjusting the reproduction speed of audio for a higher tone by lowering the reproduction speed, the tone is shifted to a lower frequency. This is similar to the idea of re-sampling the signal that is in proportion to the pitch change ratio.
FIGS. 6 and 7 illustrate a coding device and a decoding device applied with the time warping scheme. As shown in FIG. 6, the coding device performs transform coding after performing time warping on an input signal, using pitch ratio information. The pitch ratio information is needed in the decoding device which performs reverse time warping shown in FIG. 7.
Therefore, the pitch ratio has to be coded by the coding device. In prior arts, a fixed table corresponding to a small pitch ratio is used to code the pitch ratio information, and efforts are made to improve coding sound quality through time warping processing under a condition that there are limited numbers of bits available for coding the pitch ratio.