The audio speed conversion technology is technology for changing only the duration time of audio data while maintaining the fundamental frequency (pitch) thereof, and is implemented into a video/audio playback device for improving the audio quality of the audio data during trick playback. The following is a description of a conventional speed conversion.
According to the conventional speed conversion, audio data is divided into a plurality of cycles, and each cycle is further divided into segments each having a length of 12 milliseconds. Assuming here that each cycle is divided into five segments A, B, C, D and E, the following are performed in one of the cycles: (a) obtaining all possible combinations of the five segments; (b) calculating a degree of similarity of each possible combination, the degree of similarity indicating inter-segment similarity; and (c) judging, out of all possible combinations of the five segments, which combination has the highest degree of similarity. If a pair of B and C has the highest degree of similarity of all combinations of A, B, C, D and E, then B and C are overlapped such that B and C are played back simultaneously. B and C can be overlapped by performing the following in listed order: (a) multiplying the segment B, which temporally precedes the segment C, by a window function that gradually decreases with time (hereafter, “decreasing window function”); (b) multiplying the segment C, which is temporally behind the segment B, by a window function that gradually increases with time (hereafter, “increasing window function”); and (c) adding the segments B and C. A result of this overlap is B/C. Accordingly, if the above A, B, C, D and E are output in the form of A, B/C, D and E, then a time length of the cycle would be ⅘ of the original time length thereof. By performing the above-described similarity calculation and overlap in every cycle, a time length of the audio data can be decreased to ⅘ of the original time length thereof.
B and C can also be overlapped by performing the following in listed order: (a) multiplying the segment B, which temporally precedes the segment C, by an increasing window function; (b) multiplying the segment C, which is temporally behind the segment B, by a decreasing window function, and (c) adding the segments B and C. A result of this overlap is C\B. If this C\B is added to the above A, B, C, D and E, and A, B, C, D, and E are output in the form of A, B, C\B, C, D and E, then a time length of the cycle would be increased to 6/5 of the original time length thereof. By performing the above-described similarity calculation and overlap in every cycle, a time length of the audio data can be increased to 6/5 of the original time length thereof.
Known examples of the above-described speed conversion include a method for hearing assistance with a function for controlling the speech speed in audio data (Patent Reference 1), and an audio conversion device that performs the conversion linearly with respect to audio data (Patent Reference 2 or Non-Patent Reference 1).    Patent Reference 1:            Japanese Laid-Open Patent Application No. H05-80796            Patent Reference 2:            Japanese Laid-Open Patent Application No. H04-104200            Non-Patent Reference 1:            Suzuki and Misaki, “An Implementation of a Time-Scale Modification Method on a DSP,” Shingakugiho, SP90-34, 1990.        