As the performance of processors, acceleration hardware and storage devices continue to improve, playback speeds beyond normal (i.e., 1.0) are possible. However, buffering and stream alignment issues limit the degree of interactivity between an application and user-perceived changes in the actual playback speed. It is possible to solve this problem in a monolithic and closed environment, where all the components of the solution are intertwined. But it's a much harder issue in the case of an open and componentized solution. Several attempts have been made to solve the problem, although each has fundamental flaws and oversights that have limited their usefulness.
The playback rate (also referred to as the playback speed) determines the amount of time that each frame of data is displayed. Fast playback rates typically display frames for shorter periods of time than slower playback rates. Fast playback rates have high bandwidth requirements that can exceed most processor storage retrieval and hardware capabilities. Usually fast playback rates are approximated using so-called “scan modes” that selectively present only a (small) portion of a data stream by discarding some of the data of the stream. This is somewhat analogous to a rapidly progressing slide show.
Many video applications, such as those that execute on computers or in connection with interactive television sets, are composed of a user interface that controls a source (or source filter). The source (or source filter) is part of a data processing pipeline that processes the data so that the data can be ultimately rendered for a user. The source reads media files and typically passes the data samples or buffers (which are usually compressed using, e.g., MPEG) to some type of decoder for processing. The decoder decompresses the data and passes it to some type of renderer that is configured to and capable of rendering the data for the user. The renderer typically uses an internal (or external) clock, and various timing information that is included with the data samples themselves, to present or render the samples at the correct time. When the renderer begins processing, an initial rendering clock time can be passed to the source and decoder. The source can then begin to produce samples with timestamps that start at some point after the initial renderer time. The timestamps are used by the renderer to schedule and render the various data samples based on their authored time of presentation. Small delays between pipeline and/or processing components (such as filters), can occur since samples are buffered between each stage in the data processing pipeline. The (graph or) pipeline latency is the cumulative propagation delay of the sample from the source (filter) to the time that it is presented or rendered. It has been and continues to be a goal of developers to enable systems to smoothly playback data, such as video content, at different playback rates (both in the forward and reverse directions). The nature of data processing pipelines and various data formats, however, continues to present challenges to developers.
Consider, for example, some different data formats—the MPEG-2, DVD and HDTV formats.
MPEG-2
The MPEG-2 format is a format referred to as a “forward decoding” format. An example representation of an MPEG-2 format is shown in FIG. 1 generally at 10. Each video sequence is composed of a series of Groups of Pictures (or “GOPs”). A GOP is composed of a sequence of pictures or frames. Frames can be encoded in three types: intra-frames (I-frames), forward predicted frames (P-frames), and bi-directional predicted frames (B-frames).
An I-frame or “key frame” (such as I-frame 12) is encoded as a single image, with no reference to any past or future frames. The encoding scheme used is similar to JPEG compression. A P-frame (such as P-frame 18) is encoded relative to the past reference frame. P-frames can also be considered as “delta frames” in that they contain changes over their reference frame. A reference frame is a P- or I-frame. The past reference frame is the closest preceding reference frame. A B-frame (or bi-directional frame, such as frames 14 and 16) is encoded relative to the past reference frame, the future reference frame, or both frames. The future reference frame is the closest following reference frame (I or P). B-frames are a function of only the adjacent reference frames.
The GOP structure is intended to assist random access into a sequence. A GOP is typically an independently decodable unit that can be of any size as long as it begins with an I-frame.
One problem associated with the MPEG-2 format pertains to being able to playback the data in reverse. Playing the data forward is typically not a problem because the format itself is forward decoding—meaning that one must typically decode the I frame first and then move on to the other frames in the GOP. Playing back the data in reverse, however, is a little more challenging because one cannot backward-decode the GOP.
DVD
Normally, when images are recorded on a disk, such as a DVD, the video is actually broken into small units covering a pre-determined time period (typically ½-second units or video object basic units (“VOBUs”)). The advantage of this format is that when you play the video, you can progress through the video units one by one. If one wants to jump to an arbitrary piece of video, one can simply jump to the video unit of interest and the audio and video will be synchronized. The location at which all streams are synchronized is referred to as a “clean point”. Accordingly, when the video and audio units are compressed, they are compressed in a unit that is to be rendered at the exact same time—that is, there is no skew between the audio and video.
All references to I-frames when discussed within the MPEG2 context can be extended to keyframes in other data formats. The term I-frame is synonymous with a keyframe when discussed outside of the MPEG2 context.
HDTV: ATSC (American Television Standards Commission) and DVB (European Format)
High Definition Television or HDTV uses the MPEG-2 format as well. Here, however, video blocks and audio blocks are aligned with a bit of a skew. In this case, one cannot simply fast forward or jump to a certain point in the stream because, while there may be a video sample at that point, the associated audio sample begins at another location in the stream. Additionally, the audio sample can only be decoded forward as a block. This means that one has to back up within the stream and look for the associated audio sample. Depending on the particular format, one may not really know where the beginning of the corresponding audio block or sample is located. Thus, one has to keep looking back in the stream for some point before both the video and audio samples of interest.
With these different types of formats come challenges when one attempts to enable different playback rates and directions for an open and componentized solution.
Consider now FIG. 2 which illustrates an exemplary system 200 that can render data from a DVD. System 200 includes an application 202 that communicates with a source component 204 that reads data off of a DVD 206. The data that is read off of the DVD includes audio and video data that has been encoded and multiplexed together. As the source reads the data off of the DVD, it applies timestamps to the data packets which are then used to synchronize and schedule the packets for rendering. The packets are then provided to a demultiplexer (or “demux”) 208 which splits the packets into different constituent portions—audio, video and, if present, subpicture packets. The packets are then provided by the demultiplexer to an associated decoder such as video decoder 210 (for decoding video packets), audio decoder 212 (for decoding audio packets) and subpicture decoder 214 (for decoding subpicture packets). Each one of the packets has associated timing information, which defines when the packet is supposed to be rendered. The various decoders then decompress their associated packets and send the individual data samples or packets (including the packets' timestamps) to the appropriate renderers—such as video renderer 216 and audio renderer 218.
System 200 also typically includes a global clock 220 that is used by the various renderers to ascertain when to render certain data samples whose timestamps coincide with a time indicated by the global clock.
Assume now that a user indicates, via application 202, that he/she wish to have the data samples rendered at a different, perhaps faster rate.
A past approach for regulating a forward rate change is to manipulate the global clock 220. That is, if one wishes to play data twice as fast as the normal rate, then by manipulating the speed of the global clock, the desired rate change can be implemented. The problem with this approach is that the audio renderer can experience problems associated with frequency shifts and distorted audio output—which degrades the user's experience. Additionally, when the video renderer attempts to comply with the clock change, the video renderer can get behind in its processing which results in the renderer dropping samples to attempt to catch up. The overall result of this is a frequency shift on the audio, and a tug-and-pull on the video. The subpicture component, which can produce data that gets sent to the video renderer, can also have problems associated with the global clock change thus causing, for example, the subpicture to be rendered at an inappropriate time or in connection with inappropriate video. Thus, the quality of the output can be significantly degraded.
Another approach that attempts to deal with a forward rate change is to have source 204 notify demultiplexer 208, which, in turn, notifies video decoder 210 to make the appropriate rate change. The decoder 210 can then do scaling operations on the samples' timestamps to make the video play at a different rate. The problem with this approach is that there is no guarantee that the video decoder 210, audio decoder 212 and subpicture decoder 214 will process the samples using the same techniques and algorithms—which is particularly true if the different decoders come from different vendors. Hence, the rate change can be affected at slightly different speeds which, in turn, can cause the video and audio to start to drift. Even worse, the subpicture can become unsynchronized which can cause it to appear at the wrong time.
Additionally, these two approaches were only really employed in the context of forward-played video and not backward-played video. Using these past approaches, there really was (and is) no way to tell the video renderer to play the video backwards. The video renderer typically has no control over or knowledge about how the video is read off of the disk.
Accordingly, this invention arose out of concerns associated with providing improved methods and systems for processing renderable digital data.