1. Field of the Invention
This invention relates to time-scale modification methods and apparatuses that perform time-scale modification (i.e., compression or expansion with respect to time) on digital audio signals without changing original pitches and sound qualities in accordance with desired time-scale modification factors.
This application is based on Patent Application No. Hei 11-126356 filed in Japan, the content of which is incorporated herein by reference.
2. Description of the Related Art
Normally, time-scale modification techniques are effected to perform compression and expansion on digital audio signals with respect to time, where the original pitches of the digital audio signals are not changed. Those techniques are used in a variety of fields such as so-called xe2x80x9cscale adjustmentxe2x80x9d in which an overall recording time for recording digital audio signals is adjusted to a prescribed time and tempo modificationxe2x80x9d used by Karaoke apparatuses, for example. A cut-and-splice method is known as one of the time-scale modification techniques and is disclosed in the paper entitled xe2x80x9cTime-Scale Modification Algorithm for Speech by Use of Pointer Interval Control Overlap and Add (PICOLA) and Its Evaluationxe2x80x9d, written by Morita and Itakura on Pp. 149-150 of monographs 1-4-14 issued for the autumn meeting of Japan Acoustics Engineering Society in October 1986.
The Morita and Itakura paper discloses two wave segments, which are adjacent to each other in original audio signal waves and which are closely related to each other with highest waveform correlation, are extracted and are subjected to duplicate addition to produce a mixed wave. Thus, an overall time of the audio signals is shortened by substituting the mixed wave between the two wave segments.
FIGS. 5A-5F and FIGS. 6A-6F show waveforms, which are used to explain concrete operations of time-scale modification processing being effected on original audio signals. Specifically, FIGS. 5A-5F show concrete operations of time-scale compression, while FIGS. 6A-6F show concrete operations of time-scale expansion.
FIGS. 5A, 6A show original waveforms corresponding to original audio data on a prescribed time scale. Herein, similarity detection processes are performed to extract a basic period Lp that emerge with respect to adjacent wave segments on the time scale. Concretely speaking, a minimal value Lmin is set as an initial value for a wave segment length, so that similarity is detected between adjacent wave segments each corresponding to Lmin. Such similarity detection is repeatedly performed by gradually increasing the length from Lmin and is stopped when the length is increased to a maximal value Lmax. Herein, all lengths are examined with respect to similarities, so that a certain length that provides a best similarity is selected from among the lengths and is determined as the basic period Lp, which is shown in FIGS. 5B, 6B. For the time-scale modification, two wave segments (i.e., waves A, B) which are adjacent to each other and each of which corresponds to the basic period Lp are extracted and are respectively subjected to multiplication with a certain window function, which is shown in FIGS. 5C, 6C. In the case of the time-scale compression shown in FIG. 5C, the wave A is subjected to multiplication having a level-decreasing slope to produce a wave of FIG. 5D, while the wave B is subjected to multiplication having a level-increasing slope to produce a wave of FIG. 5E. Those waves of FIGS. 5D, 5E are mixed together to produce a mixed wave, which substitutes the two waves A, B in FIG. 5F. In the case of the time-scale expansion shown in FIG. 6C, the wave A is subjected to multiplication having a level-increasing slope to produce a wave of FIG. 6D, while the wave B is subjected to multiplication having a level-decreasing slope to produce a wave of FIG. 6E. Those waves of FIGS. 6D, 6E are mixed together to produce a mixed wave, which is inserted between the waves A, B in FIG. 6F.
The aforementioned time-scale modification technique suffers from a problem in which a great amount of processing is required for similarity evaluation (i.e., similarity detection and examination) to extract the basic period from the original audio data. In the conventional similarity evaluation, similarity calculations are repeated every time the length is increased by a prescribed value within a range between Lmin and Lmax with respect to each of wave segments, wherein the calculations are performed on all samples contained in each wave segment being examined. So, as a sampling frequency becomes higher, the amount of processing required for the similarity evaluation should be greatly increased.
It is expected that the sampling frequency ranges from 50 Hz to 200 Hz. In other words, a maximal length for the wave segment is given by the sampling frequency of 50 Hz, and a minimal length is given by the sampling frequency of 200 Hz. The inventor of this invention evaluates similarity calculations which are needed with respect to each of prescribed sampling frequencies. Table 1 shows total numbers of arithmetic operations (e.g., multiplication and addition) being required for the similarity calculations with respect to three sampling frequencies, i.e., 16 kHz, 32 kHz and 48 kHz.
Table 1 shows that increasing the sampling frequency bring a great increase of a number of arithmetic operations required for the similarity calculations. That is, an amount of processing for the similarity evaluation is remarkably increased in response to an increase of the sampling frequency.
It is an object of the invention to provide a time-scale modification method or apparatus that performs time-scale modification on audio signals with a reduced amount of processing particularly related to similarity evaluation for evaluating similarities between adjacent wave segments.
A time-scale modification method or apparatus of this invention performs time-scale modification (i.e., compression or expansion with respect to time) on original audio signals having waves. Adjacent wave segments are divided and cut from the waves of the original audio signals by various lengths. Herein, a certain number of samples are thinned out from each of the adjacent wave segments to provide a reduced amount of data regarding each of the adjacent wave segments. Calculations are performed on the reduced amount of data to sequentially produce similarities between the adjacent wave segments in response to the various lengths being sequentially changed over. The similarities are evaluated to determine a length that provides a best similarity within the various lengths as a basic period. Thus, the waves of the original audio signals are divided and cut into two waves by the basic period. Time-scale modification is effected on the two waves to produce a mixed wave. Using the mixed wave, it is possible to provide output signals, which correspond to results of the time-scale modification being effected on the original audio signals in accordance with a designated time-scale modification factor without causing pitch variations.
In the case of compression, the two waves are subjected to windowed multiplication and addition to produce a mixed wave, which substitutes for the two waves, so that the original audio signals are compressed by the basic period. In the case of expansion, the two waves are subjected to windowed multiplication and addition to produce a mixed wave, which is inserted between the two waves, so that the original audio signals are expanded by the basic period.
Because data of the wave segments are adequately reduced for calculations of the similarities while the time-scale modification is effected on entire data of the original audio signals, it is possible to reduce an overall amount of processing without causing deterioration in sound quality of reproduced sounds being reproduced by way of the time-scale modification. Incidentally, the data are reduced by thinning out a single sample per every two samples of the original audio signals, or the data are reduced by thinning out two samples per every three samples of the original audio signals, for example.