In the context of audio signals, the term “compression” can have two different meanings. “Temporal compression” refers to an increase in the speed at which a recorded audio signal is reproduced, thereby reducing the amount of time required to play back the signal, relative to the original recording. “Data compression” refers to a reduction in the number of bits that are used to represent an audio signal in a digital format. The present invention is concerned with both types of compression of an audio signal, as well as temporal expansion to slow down the reproduction rate.
There are a variety of techniques that are employed to effect the temporal compression and expansion of audio, so that it can be played back over periods of time which are less than, or greater than, the period over which it was recorded. Each technique has its associated advantages and limitations, which makes each one more or less suitable for a given application. One of the earliest examples of temporal compression is the “fast playback” approach. In this approach, a recorded audio signal is reproduced at a higher rate by speeding up an analog waveform, e.g., transporting a magnetic tape at a faster speed during playback than the recording speed. The digital equivalent of this approach is accomplished with low-pass filtering the waveform, sub-sampling the result, and then playing back the new samples at the original sampling rate. Conversely, by reducing the speed of playback, the audio waveform is expanded. In the digital context, this result can be accomplished by up-sampling the waveform, low-pass filtering it, and playing it back at the original sampling rate. This approach has the advantage of being extremely simple to implement. However, it has the associated disadvantage of shifting the pitch of the reproduced sound. For instance, as the playback rate is increased, the pitch shifts to a higher frequency, giving speech a “squeaky” characteristic.
Another approach to the temporal compression of audio is known as “snippet omission”. This technique is described in detail, for example in a paper published by Gade & Mills entitled “Listening Rate and Comprehension as a Function of Preference for and Exposure to Time-Altered Speech,” Perceptual and Motor Skills, volume 68, pages 531–538 (1989). In the analog domain, this technique is performed with the use of electromechanical tape players having moving magnetic read heads. The players alternately reproduce and skip short sections, or snippets, of a magnetic tape. In a digital domain, the same result is accomplished by alternately maintaining and discarding short groups of samples. To provide temporal expansion using this approach, each section of the tape, or digital sample, is reproduced more than once. The snippet omission approach has an advantage over the fast playback approach, in that it does not shift the pitch of the original input signal. However, it does result in the removal of energy from the signal, and offsets some of the signal energy in the frequency domain according to the lengths of the omitted snippets, resulting in an artifact that is perceived as a discernable buzzing sound during playback. This artifact is due to the modulation of the input signal by the square wave of the snippet removal signal.
More recently, an approach known as Synchronous Overlap-Add (SOLA) has been developed, which overcomes the undesirable effects associated with each of the two earlier approaches. In essence, SOLA constitutes an improvement on the snippet omission approach, by linking the duration of the segments that are played or skipped to the pitch period of the audio, and by replacing the simple splicing of snippets with cross-fading, i.e. adjacent groups of samples are overlapped. Detailed information regarding the SOLA approach can be found in the paper by Roucous & Wilgus entitled “High Quality Time-Scale Modification for Speech,” IEEE International Conference on Acoustics, Speech and Signal Processing, Tampa, Fla., volume 2, pages 493–496 (1985). The SOLA approach does not result in pitch shifting, and reduces the audible artifacts associated with snippet omission. However, it is more computationally expensive, since it requires analysis of local audio characteristics to determine the appropriate amount of overlap for the samples.
Digital audio files are now being used in a large number of different applications, and are being distributed through a variety of different channels. To reduce the storage and transmission bandwidth requirements for these files, it is quite common to perform data compression on them. For example, one popular form of compression is based upon the MPEG audio standard. Some applications which are designed to handle audio files compressed according to this standard may include dedicated decompression hardware for playback of the audio. One example of such an application is a personal video recorder, which enables a viewer to digitally record a broadcast television program or other streaming audio-video (AV) presentation, for time-shifting or fast-forward purposes. The main components of such a system are illustrated in FIG. 1. Referring thereto, when an incoming AV signal is to be recorded for later viewing, it is fed to a compressor 2, which digitizes the signal if it is not already in a digital format, and compresses it according to any suitable compression technique, such as MPEG. Alternatively, in a digital transmission system, the incoming signal may already be in a compressed format.
The compressed AV signal is stored as a digital file on a magnetic hard disk or other suitable storage medium 4, under the control of a microprocessor 6. Subsequently, when the viewer enters a command to resume viewing of the presentation, the file is retrieved from the storage medium 4 by the microprocessor 6, and provided to a decompressor 8. In the decompressor, the file is decompressed to restore the original AV signal, which is supplied to a television receiver for playback of the presentation. Since the compression and decompression functions are performed by dedicated components, the microprocessor itself can be a relatively low-cost device. By minimizing costs in this manner, the entire system can be readily incorporated into a set-top box or other similar types of consumer device.
One of the features of the personal video recorder is that it permits the viewer to pause the display of the presentation, and then fast-forward through portions that were recorded during the pause. However, in applications such as this, temporal modification of the audio playback to maintain concurrency with the fast-forwarded video is extremely difficult. More particularly, the conventional approach to the modification of compressed audio is to decompress the file to reconstruct the original audio waveform, temporally modify the decompressed audio, and then recompress the result. However, the main processor 6 may not have the capability, in terms of either processing cycles or bandwidth, to be able to perform all of these functions. Similarly, the decompressor 8 would have to be significantly altered to be able to handle temporal modification as well as data decompression. Consequently, temporal modification of the playback is simply not feasible in many devices which are designed to handle data-compressed audio files.
It is an objective of the present invention to provide for the modification of a data-compressed audio waveform so that it can be played back at speeds that are faster or slower than the rate at which it was recorded, without having to modify the decompression board, and without requiring that the audio waveform be completely decompressed within the main processor of a device.