1. Field of the Invention
The present invention relates to a signal-synthesizing method, and more particularly, to a multiple step adaptive method for time-scaling.
2. Description of the Prior Art
Due to the dramatic progress in electronic technologies, an AV player such as a Karaoke can provide more and more amazing functions, such as audio clean-up, dynamic repositioning of enhanced audio and music (DREAM), and time scaling. Time scaling (also called time stretching, time compression/expansion, or time correction) is a function to elongate or shorten an audio signal while keeping the pitch of the audio signal approximately unchanged. In short, time scaling only adjusts the tempo of an audio signal.
In general, an AV player performs time scaling with one of three following methods: Phase Vocoder, Minimum Perceived Loss Time Expansion/Compression (MPEX), and Time Domain Harmonic Scaling (TDHS). Phase Vocoder transforms an audio signal into a complex Fourier representation signal with Short Time Fourier Transform (STFT) and further transforms the complex Fourier representation signal back to a time scaled audio signal corresponding to the original audio signal with interpolation techniques and iSTFT (inverse STFT). MPEX is a method researched and developed by Prosoniq for simulating characteristics of human hearing, similar to artificial neural network. MPEX records audio signals received for a predetermined period and tries to “learn” the audio signals, so as to either elongate or shorten the audio signals. TDHS is one of the most popular methods for time scaling. TDHS first establishes an autocorrelogram of a first audio signal, the autocorrelogram consisting of a plurality of magnitudes, and then delays the first audio signal by a maximum index corresponding to a maximum magnitude, a largest magnitude among all of the magnitudes of the autocorrelogram, to form a second audio signal, and lastly synchronizes and overlap-adds (SOLA) the first audio signal to the second audio signal to form a third audio signal longer than the first audio signal.
Please refer to FIG. 1, which is an autocorrelogram 10 for TDHS according to the prior art, the autocorrelogram 10 consisting of a plurality of magnitudes. In general, besides a maximum magnitude 12 and magnitudes there away, remaining magnitudes in the autocorrelogram 10 has a small value. In addition, two neighboring magnitudes of the autocorrelogram 10 differ slightly. For example, if a first magnitude 14 is far smaller than the maximum magnitude 12, a second magnitude 16 neighboring the first magnitude 14 is also far smaller than the maximum magnitude 12. On the contrary, if a third magnitude 18 differs slightly from the maximum magnitude 12, a fourth magnitude 20 neighboring the third magnitude 18 is probably very close to the maximum magnitude 12 and accordingly a fourth indexτ4(corresponding to the third 18 or fourth magnitude 20 as shown in FIG. 1) is also probably very close to a maximum indexτmaxcorresponding to the maximum magnitude 12.
In a computer system, the autocorrelogram 10 is usually established by a digital signal processing (DSP) chip designed to manage complex mathematic calculation such as convolution and fast Fourier transform (FFT). However, a process to determine the maximum magnitude 12 and the corresponding maximum indexτmaxby establishing the autocorrelogram 10 with a DSP chip is tedious and sometimes unnecessary.