1. The Field of the Invention
The present invention is directed to synchronizing audio and video data and, more particularly, to synchronizing streamed audio and video data during seek operations.
2. Related Technology
A multimedia sample often includes discrete audio and video data packets that are assigned time stamps corresponding to a desired presentation of the multimedia. This is useful for synchronizing the audio and video content of the multimedia so that it can be rendered in the desired manner and sequence.
One method for synchronizing audio and video data is practiced with the use of cleanpoints. A cleanpoint refers to a point, within multiplexed multimedia content, from which a clean or synchronous playback of the multimedia can start. For example, a cleanpoint indicates a point in which the corresponding streams of data (e.g. audio and video) are synchronized. Typically the cleanpoints have synchronized Presentation TimeStamps (PTS)s that are assigned to the audio and video content, indicating when the audio and video content should be rendered to achieve the desired presentation.
Cleanpoints are extensively used when recording the multimedia content in a fixed medium, such as a DVD, for enabling the viewer to seek around in the multimedia presentation, without disrupting the syncing of the audio and video content. In a Digital Video Disk (DVD), for example, cleanpoints are established between groups of Video OBject Units (VOBUs) that are contained within the multimedia sample. Typically, the cleanpoints are established at the beginning of each Video OBject (VOB) that includes one or more VOBUs. Corresponding audio and video PTS are typically contained within individual VOBs, such that a viewer may seek to any VOB boundary, from which synchronized playback of the audio and video may be obtained.
Despite the utility of cleanpoints, they are often not used with streaming data because streaming data is typically delivered on a ready to be rendered basis. Instead, streaming data is typically synchronized to timestamps that are assigned to the individual data packets of the multimedia stream, which enables the audio and video data to be rendered at the appropriate time. This works well when the presentation of the multimedia is played as it is received or when it is started from the beginning of the presentation because the audio and video data can be rendered according to their corresponding timestamps, based upon a beginning PTS of zero. Thereafter, it can be determined when to play all subsequently received audio and video data packets based on their corresponding timestamps with respect to the beginning PTS of zero.
The absence of cleanpoints can be a problem, however, when the streamed data is recorded and seeked around because the audio and video content is sometimes delivered in an asynchronous manner to accommodate the manner in which the data is decompressed, such that the timestamps assigned to the audio and video data cannot solely be used to synchronize the multimedia content, as described below.
Decompression of audio data is typically a short and relatively simple process. In contrast, video decompression is a relatively complex process in which certain portions of the video data is often decompressed, cached and then later used to decompress other remaining portions of the video data. Accordingly, audio data may be delivered almost immediately before the time in which it is to be rendered within the media stream, whereas it is often necessary to deliver the corresponding video data prior to the delivery of the audio data. This prior delivery of the video data accommodates the relatively complex and time-consuming procedure of decompressing the video data. This helps to ensure that the video data will be properly decompressed in time to be rendered with the corresponding audio data.
Even though the audio and video packets of a streamed multimedia sample may be timestamped, it may still be difficult to present the audio and video content in sync when the data is being seeked, as mentioned above, and as described in more detail with respect to FIG. 1.
FIG. 1 illustrates a multimedia sample 100 that includes video packets (V1, V2, V3, V4, V5 and V6) and audio packets (A1, A2 and A3). Although not shown, the multimedia sample 100 may also include other video packets and audio packets that are received prior to and subsequent to the data packets that are shown.
The illustrated video data packets and audio data packets correspond with audio and video media streams of the multimedia sample. Accordingly, some of the video data packets and audio data packets are assigned timestamps (t=n, where n is the assigned timestamp), corresponding with a desired presentation of the multimedia sample 100 in which the video and audio media streams are played in synchronization. For example, video data packet V2 is assigned a timestamp t=5, audio data packet A1 is assigned a timestamp t=4, video data packet V4 is assigned a timestamp t=6, audio data packet A2 is assigned a timestamp t=5, video data packet V5 is assigned a time stamp t=7, and audio data packet A3 is assigned a timestamp t=6. When the multimedia sample 100 is played in synchronization, the audio data packet A2 will be rendered at the same time as video data packet V2.
The problem with synchronizing the multimedia sample 100 becomes evident when a viewer seeks to a location within the sample 100, such as location 110. For example, when the reader is seeked to starting point 110 to commence playback of the multimedia sample 100, it is unclear where or when the playback of the multimedia content should actually begin to enable synchronous playback. In particular, it is unclear whether the playback should commence at video data packet V2 or at some other location.
Renderers typically prefer to receive zero-based PTS numbering scheme for the multimedia data that is to be rendered. However, when the playback of the multimedia content commences at a starting point that is not a cleanpoint, then it becomes unclear how to create a zero-based PTS scheme that will synchronize the media streams of the multimedia sample 100. For example, in the present case, it is unclear which data packet(s) should be assigned a PTS of zero and how the remaining timestamps should be renumbered to satisfy the appropriate rendering devices with a zero-based PTS numbering scheme.
If a PTS of zero is merely assigned to the first encountered video and/or audio data packet(s), then the multimedia presentation may be played out of sync. For instance, if video data packet V2 and audio data packet A1 are assigned PTSs of zero, then they will be played at substantially the same time upon being transmitted to the appropriate rendering devices, even though audio data packet A1 has a timestamp (t=4) that is less than the timestamp assigned to video data packet V1 (t=5). Accordingly, merely assigning a PTS of zero to each of the first encountered audio and video data packets can result in the multimedia stream being played out of sync.
When the multimedia stream includes cleanpoints, the cleanpoints may be used to synchronize the various streams of the multimedia content. However, as described above, some streaming multimedia samples do not include cleanpoints, thereby making it difficult to synchronize the various media streams of the multimedia, as described above. Furthermore, even when the multimedia sample includes cleanpoints, it can still be difficult to synchronize the streams of the multimedia when the playback commences at any starting point that is not a cleanpoint.
Accordingly, there currently exists a need in the art for improved methods and systems for synchronizing streaming multimedia.