It is desirable to be able to vary the apparent display rate (i.e., the rate of change of the display as perceived by an observer, as opposed to the rate at which data is processed to generate the display) of a display generated from audio, video, or related audio and video data. For example, it may be desirable to increase the apparent display rate so that a quick overview of the content of the data can be obtained, or because it is desired to listen to or view the display at a faster than normal rate at which the content of the display can still be adequately digested. Alternatively, it may be desirable to slow the apparent display rate so that the display can be more carefully scrutinized, or because the content of the display can be better digested at a slower rate.
Both audio and video data can be represented in either analog or digital form. The method used to manipulate audio and/or video data to accomplish variation in the apparent display rate of a display generated from that data depends upon the form in which the data is represented. However, conventional devices enable data in one form to be easily converted to the other form (i.e., analog to digital or digital to analog), thus affording wide latitude in the use of methods to accomplish display rate variation, regardless of the form in which the data originally exists.
The apparent display rate of an audio display or a video display can be increased or decreased by deleting specified data from, or adding specified data to (e.g., repeating certain data), respectively, a corresponding set of digital audio data or digital video data that represents the content of the display. Previously, such variation of the apparent display rate of either an audio display or a video display has been accomplished using one of a variety of techniques. For example, the apparent display rate of an audio display represented by a set of digital audio data has been varied by using the synchronized overlap add (SOLA) method (discussed in more detail below) to appropriately modify an original set of digital audio data to produce a modified set of digital audio data from which the audio display is generated.
Often, a set of audio data is related to a particular set of video data and the two are used together to generate an audiovisual display, such as occurs, for example, in television broadcasts, motion pictures or computer multimedia displays. When the apparent display rate of an audiovisual display is varied, the audio display and video display must be synchronized to maintain temporal correspondence between the content of the audio and video displays. (Alternatively, the audio display can be eliminated altogether, thus obviating the need to maintain synchronization; however, the content of the audio display is lost.)
Previously, the apparent display rate of an audiovisual display has been varied by deleting or repeating video data (e.g., video frames) in a uniform manner, as appropriate, and deleting or repeating audio data in a uniform manner that corresponds to the treatment of the video data (e.g., if the apparent display rate of the video display is speeded up to 2 times the original display rate by, for example, eliminating every other video frame, then the audio display is likewise speeded up by eliminating every other audio sample or every other set of a predetermined number of audio samples). While this approach is effective in maintaining synchronization, it can cause distortion in the audio and video displays, particularly at relatively high or low apparent display rates. In particular, the audio display can be distorted so that, as the apparent display rate increases, human voices increasingly begin to manifest a “chipmunk effect,” and, as the apparent display rate decreases, human voices begin to sound as though the speaker is in a stupor. Such distortion of the display is a consequence of the fact that the elimination of audio data from the original set of audio data is done mechanically, without consideration of the content of the audio data being eliminated or retained.
A better way of varying the apparent display rate of an audiovisual display is desirable. In particular, an approach that “intelligently” modifies the audio and/or video data used to generate the display based upon an evaluation of the content of the audio data and/or video data is desirable, since such an approach can reduce or eliminate distortion of the display, and, in particular, the audio display. Good synchronization between the audio and video displays should also be maintained. Additionally, the capability of varying the apparent display rate over a wide range of magnitudes is desirable. Further, preferably the variation of the apparent display rate can be accomplished automatically in a manner that produces an apparent display rate that closely tracks a specified target display rate or rates.