The present invention relates in general to a system and method for temporal modification of audio and/or video signals, to increase or reduce the playback rate of an audio and/or a video file being streamed in a compressed format in a client-server environment.
Recent years have witnessed major advances in the field of computer systems technologies, along with the major breakthroughs in the hardware field such as in the design of microprocessors, memory, and data transfer bus. These advances in technologies along with the fierce competition in the free marketplace have reduced the price of computing, and made it affordable to the masses. Electronic computing is no longer reserved for large companies, maintaining customers' accounts in banks, or performing theoretical or engineering research.
The computing power is no longer centralized in a central mainframe, and the clients (or clients' processors) are no longer the dumb terminals they used to be, whose only function was to submit jobs to be processed at the central host. Nowadays, the personal computer, or PC, is a rich machine that can be both a client and or a host, and is becoming very affordable.
From universities to corporations, many businesses are now making digital media content, such as training courses or seminars, available online. With so much content available, it is increasingly desirable to be able to skim and browse the digital content quickly and accurately.
Digital content in multimedia files could be compressed in a variety of available compression formats. In addition, several implementations have been proposed to speed up the playback rates of digital audio. However, there is no adequate method for speedup of compressed audio files for streaming.
Two models have been proposed. One model is to precompute and store the stream in several speedup ratios. This model requires that the server maintain multiple time-compressed versions of the same digital file depending on the speedup factor for example: 1.0, 1.25, 1.5, 2.0, etc. The client will then choose the version of his/her choice at playback time. This method requires no real-time computations and only the standard bit rate is sent over the communication line. However, it just provides a fixed predetermined set of speedup ratios from which to choose. A principal disadvantage of this method is the additional storage required to store several versions of the stream for each speedup ratio. This method does not require significant computational power on the client.
Another implementation requires a real-time client audio speedup computation. In this model, the server accepts speedup factor requests from the client. The server then streams the data at n times faster than the original playback rate, n being the speedup factor, and the client's processor (also referred to herein as “client”) computes the audio speedup on the fly. This implementation does not require additional storage on the server, as in the previous model. However, this implementation overloads both the client's machine and the network because it requires:
A) the client's machine to decode the data at this faster rate and to compute the modified sped-up version for playback; and
B) additional network bandwidth, which affects the overall network performance, since the server must send more data even faster over overloaded communications network.
Another exemplary conventional system does many of these calculations, in relation to TSM of recompressed files and not streaming media. Yet another conventional system describes the advantages and needs for TSM in streaming media using precomputed streams in various speedup ratios and using networked file systems to pass the streams from the server to the client.
Subsequently, upon demand from the viewer, the file is retrieved and decompressed for playback. A variety of techniques may be employed to effect the compression and expansion of audio signal so that it can be played back over periods of time that are different than the period over which it was recorded. One of the earliest examples of audio speedup is the “fast playback” approach. In this approach, a recorded audio signal is reproduced at a higher rate by speeding up an analog waveform, e.g., transporting a magnetic tape at a faster speed during playback than the recording speed.
However, this approach shifts the pitch of the reproduced sound. For instance, as the playback rate is increased, the pitch shifts to a higher frequency, giving speech a “squeaky” characteristic.
Another approach is known as “snippet omission,” that alternately maintains and discards short groups of samples, and abutting the retained samples. The snippet omission approach has an advantage over the fast playback approach, in that it does not shift the pitch of the original input signal. However, it does result in the removal of energy from the signal, and offsets some of the signal energy in the frequency domain according to the lengths of the omitted snippets, resulting in an artifact that is perceived as a discernable buzzing sound during playback.
More recently, an approach known as Synchronous Overlap-Add (SOLA) has been developed, which overcomes the undesirable effects associated with each of the two earlier approaches. In essence, SOLA constitutes an improvement on the snippet omission approach, by linking the duration of the segments that are played or skipped to the pitch period of the audio, and by replacing the simple splicing of snippets with cross-fading, i.e., adjacent groups of samples are overlapped. The SOLA approach does not result in pitch shifting, and reduces the audible artifacts associated with snippet omission.
Digital audio files are now being used in a large number of different applications, and are being distributed through a variety of different channels. To reduce the storage and transmission bandwidth requirements for these files, it is quite common to compress the data. For example, a common form of compression is based upon the MPEG audio standard. Some applications that are designed to handle audio files, compressed according to this standard, may include dedicated decompression hardware for playback of the audio.
In a conventional compression system, when an incoming audio/video (AV) signal is recorded for later viewing, it is fed to a compressor or encoder that digitizes the input (or incoming) signal if it is not already in a compressed format, according to any suitable compression technique, such as MPEG. The compressed AV signal is stored as a file or a stream.
In one conventional system, a system precomputes several time-compressed streams and lets the user switch between them. It considers the tradeoff between the two nave approaches: having multiple streams on the server versus streaming it all and computing everything on the client. It further considers the Time Scale Modification (TSM) usage patterns by users in a user study. The present invention overcomes this tradeoff by being more flexible, saving on both bit rate and computational effort, and requiring much less storage.
There is therefore an unfulfilled need for a system and method that implement audio speedup, while concurrently reducing the computational load on the client, reducing the bandwidth overload on the network, and reducing the storage and computation needs by the simultaneous implementation of several computations. It would be desirable to pre-compute as much of the signal as possible. When the user requests an audio speed change or a portion of a panoramic image, all but the final audio or image processing work is done so that the new media can be presented to the user with minimal network and computational load.