The present application relates generally to processing audio signals. More particularly, the present invention relates to energy-based, nonuniform time-scale compression of audio signals.
The purpose of time-scale modification of an audio signal is to change the playback rate of the audio signal while preserving the original audio characteristics, such as pitch perception and frequency distribution. The modified signal is perceived as being faster (time-scale compression) or slower (time-scale expansion) with respect to the original audio.
Applications for time-scale modification include telephone voicemail systems and answering machines, where message playback can be sped up or slowed down depending on user preference. More recently, multimedia search and retrieval on local sources or over networks such as the internet have provided applications for time-scale modification of audio and video signals. The technique is also useful for streaming media delivery of multimedia materials. Deployment of time-scale modification systems and methods can dramatically improve the efficiency of retrieval of audio and speech material in large-scale databases.
Many techniques have been developed in the past for time-scale modification. In general, time-scale modification techniques can be grouped as linear and non-linear algorithms. In a linear algorithm, time compression or expansion is applied consistently across the entire audio stream with a given speed-up or slow-down rate.
The most basic example is by playing the audio at a lower sampling rate than that at which it was recorded, such as by dropping alternate samples. This results, however, in an increase in pitch, creating less intelligible and enjoyable audio.
Another basic technique involves discarding portions of short, fixed-length audio segments and abutting the retained segments. However, discarding segments and abutting the remnants produces discontinuities at the interval boundaries and produces audible clicks and other audio distortion. To improve the quality of the output signal, a windowing function or smoothing filter can be applied at the junctions of the abutted segments. One such technique is called overlap and add (OLA). Another is synchronized overlap and add (SOLA). Another is waveform-similarity overlap and add (WSOLA). The OLA-type algorithms provide benefits of simplicity and efficiency. Important design considerations in algorithm design and implementation include the processor resources required for signal processing the audio signal and data storage capacity.
In non-linear time compression, the content of the audio stream is analyzed and compression rates may vary from one point in time to another. In some examples, redundancies such as pauses or elongated vowels are compressed more aggressively.
In a typical WSOLA algorithm, fixed-length segments are extracted from the input signal near the time instants n=0, Tx, 2Tx, . . . , with Tx>0 a parameter of the algorithm. The best segments found near these time instants are overlapped and added to form the output signal. The process is shown in FIG. 2. Note that the input signal is processed at uniformly separated intervals. The time-scale ratio is defined byρ=Ty/Tx  (1)The time scale ratio ρ is less than one for time-scale compression and greater than one for time-scale expansion.
Current time scale modification algorithms do not provide adequate results in low-rate time-scale compression, for instance at ρ<0.5. Intelligibility of the resulting audio is too poor for commercial use. Accordingly, there is a need for an improved time-scale compression method and apparatus for audio signals.