1. Field of the Invention
The present invention generally relates to audio time scale modification algorithms.
2. Background
In the area of digital video and digital audio technologies, it is often desirable to be able to speed up or slow down the playback of an encoded audio signal without substantially changing the pitch or timbre of the audio signal. One particular application of such time scale modification (TSM) of audio signals might include the ability to perform high-quality playback of stored video programs from a personal video recorder (PVR) at some speed that is faster than the normal playback rate. For example, in order to save some viewing time, it may be desired to play back a stored video program at a speed that is 20% faster than the normal playback rate. In this case, the audio signal needs to be played back at 1.2× speed while still maintaining high signal quality. In another example, a viewer may want to hear synchronized audio while playing back a recorded sports video program in a slow-motion mode. In yet another example, a telephone answering machine user may want to play back a recorded telephone message at a slower-than-normal speed in order to better understand the message. In each of these examples, the TSM algorithm may need to be of sufficiently low complexity such that it can be implemented in a system having limited processing resources.
One of the most popular types of audio TSM algorithms is called Synchronized Overlap-Add, or SOLA. See S. Roucos and A. M. Wilgus, “High Quality Time-Scale Modification for Speech”, Proceedings of 1985 IEEE International Conference on Acoustic, Speech, and Signal Processing, pp. 493-496 (March 1985), which is incorporated by reference in its entirety herein. However, if this original SOLA algorithm is implemented “as is” for even just a single 44.1 kHz mono audio channel, the computational complexity can easily reach 100 to 200 mega-instructions per second (MIPS) on a ZSP400 digital signal processing (DSP) core (a product of LSI Logic Corporation of Milpitas, Calif.). Thus, this approach will not work for a similar DSP core that has a processing speed on the order of approximately 100 MHz. Many variations of SOLA have been proposed in the literature and some are of a reduced complexity. However, most of them are still too complex for an application scenario in which a DSP core having a processing speed of approximately 100 MHz has to perform both audio decoding and audio TSM. U.S. patent application Ser. No. 11/583,715 to Chen, entitled “Audio Time Scale Modification Using Decimation-Based Synchronized Overlap-Add Algorithm,” addresses this complexity issue and describes a decimation-based approach that reduces the computational complexity of the original SOLA algorithm by approximately two orders of magnitude.
Most of the TSM algorithms in the literature, including the original SOLA algorithm and the decimation-based SOLA algorithms described in U.S. patent application Ser. No. 11/583,715, were developed with a constant playback speed in mind. If the playback speed is changed “on the fly,” the output audio signal may need to be muted while the TSM algorithm is reconfigured for the new playback speed. However, in some applications, it may be desirable to be able to change the playback speed continuously on the fly, for example, by turning a speed dial or pressing a speed-change button while the audio signal is being played back. Muting the audio signal during such playback speed change will cause too many audible gaps in the audio signal. On the other hand, if the output audio signal is not muted, but the TSM algorithm is not designed to handle dynamic playback speed change, then the output audio signal may have many audible glitches, clicks, or pops.
What is needed, therefore, is a time scale modification algorithm that is capable of changing its playback speed dynamically without introducing additional audible distortion to the played back audio signal. In addition, as described above, it is desirable for such a TSM algorithm to achieve a very low level of computational complexity.