The present invention pertains to a method and apparatus for synchronizing data streams with audio/video streams, and more particularly, to a method and apparatus for synchronizing a presentation of audio and/or video with the execution of data control information.
Referring to FIG. 1, a known system for rendering audio and video streams is shown. A central processing unit (CPU) 1 executes application code typically stored in a memory 9. CPU 1 is coupled to a host bus 3 which is coupled to memory 9 via a first bridge circuit 5 (also referred to as a host bridge circuit or a north bridge circuit). The first bridge circuit is, in turn, coupled to a first bus 7, such as a bus operating according to the Peripheral Component Interconnect specification (PCI Special Interest Group, P.O. Box 14070, Portland, Oreg. 97214). A second bus 11, such as an expansion bus, is coupled to the PCI bus 7 via a second bridge circuit 10 (also referred to as a south bridge circuit). A modulator/demodulator (modem) 13 is coupled to the expansion bus 11 and is adapted to receive data from a transmission medium 110 (e.g., so-called plain old telephone service (POTS) lines or the Internet system). In FIG. 1, a client 100 is adapted to receive data from a server 120 via transmission medium 110. In current applications, this data can be audio and/or video (A/V) data that is transferred using the so-called Real-time Transfer Protocol (RTP). Under the RTP protocol, data such as A/V data is transferred from transmission medium 110 to the client as packets and are processed by an A/V subsystem 15 coupled to the PCI bus 7.
An example of such an A/V subsystem 15 is shown in FIG. 2. Referring to FIG. 2, the incoming RTP data packets are received at a packet preparation module/payload handler 18. The RTP protocol is defined by the Internet Engineering Task Force (IETF) and provides an end-to-end network transport function suitable for applications transmitting real-time data over multicast or unicast network services. Payload handler 18 analyzes each incoming RTP data packet by reading RTP header information and xe2x80x9cstrippingxe2x80x9d this data off of the packets. An exemplary RTP data packet 50 is shown in FIG. 3. Each RTP header includes a variety of information, such as a Payload Type field that identifies the type of information contained in the RTP packet (e.g., a specific type of video or audio data). A Marker Bit (M) can be provided to identify whether the RTP packet 50 contains the end of a current frame of video data. The RTP header 51 also includes a Timestamp field that is used to synchronize audio and video data appearing as a data payload 52 in the RTP data packet 50.
The payload handler 18 determines whether the data payload 52 contains audio or video data and forwards it to the appropriate data packet handler (e.g., video data packet handler 20 and audio data packet handler 22). Video data packet handler 20 and audio data packet handler 22 can be coupled to the payload handler 18 directly or can be coupled indirectly (e.g., through bus 7). Therefore, video data packet handler 20 receives a stream of video data packets and audio data packet handler 22 receives a stream of audio data packets. Payload handler 18 controls synchronization of the audio and video data packet streams via the timestamp field appearing in RTP header 51. Accordingly, audio data that is synchronized to a specific frame of video data are sent approximately at the same time to the respective data packet handlers 20, 22.
Video data packet handler 20 analyzes video packets that can have a format according to any of a variety of compression algorithms. Typical compression algorithms include any of a variety of block transform based video compression algorithms such as H.261 (International Telecommunication Unionxe2x80x94Telecommunications Standardization Sector (ITU-T), March, 1993), H.263 (ITU-T, Dec. 5, 1995), JPEG (xe2x80x9cJoint Photographic Expert Groupxe2x80x9d)(International Organization for Standardization/International Electrotechnical Commission (xe2x80x9cISO/IECxe2x80x9d) 10918-1), MPEG-I and MPEG-II (xe2x80x9cMotion Picture Expert Groupxe2x80x9d)(ISO/IEC 11172-2 and 13818-2). The video data packet handler 20 includes a coder/decoder (codec) where the decoder portion of the codec is responsible for converting the compressed video data from the video packet into raw, uncompressed video data for video rendering device 23, which transfers this data to an output device such as display 24. The audio data packet handler 22 works in a similar manner in that received data packets are converted into digital audio data and passed to an audio rendering device 25 that converts the digital data to analog data for output at a speaker 26.
The A/V subsystem 15 can also be used as an input device (e.g., in a video phone application). A camera 28 is provided coupled to a video capture component 29 which supplies frames of uncompressed data to video data packet handler 20 at a rate of approximately 30 frames per second. As stated above, the video data packet handler 20 includes a codec and the coder portion of the codec is used to compress the video frame data according to a video compression algorithm. The video data packets generated by the video packet handler 20 are passed to the packet preparation module/payload handler 18 which creates RTP packets for transport to the end-user (e.g. over transmission medium 110 in FIG. 1). Accordingly, a microphone 27 can be provided which supplies input analog audio data to audio rendering device 25. Audio rendering device 25 converts the analog audio signals into digital signals for the audio data packet handler that creates audio data packets for payload handler 18. Payload handler 18 uses the audio data packets to create RTP packets for transport to the end-user.
The A/V subsystem 15 controls the synchronization of the presentation of audio and video data. However data control information other than traditional audio and video data is sent outside of the RTP protocol (if it is sent at all) and is not synchronized with the A/V data streams. It would be desirable to synchronize the presentation of data objects pursuant to data control information for a more complete and accurate presentation of information to an end-user.
According to an embodiment of the present invention, a payload handler is provided adapted to receive a plurality of data packets, each of the data packets including a data payload including either video data or data control information, a first header including a timestamp field, and a data control header identifying a type of data contained in the data payload. A data control filter coupled to the payload handler can receive the data control header and data payload for each of the data packets. The data control filter passes data payloads including data control information to a data handler and data payloads including video data to a video data packet handler. The data control information includes a command in an action identifier field which is executed by the data handler. The payload handler synchronizes a transfer of the video data and data control information to the data control filter based on information in the timestamp field of the data packet header.