a. Field of the Invention
The present invention concerns a method and apparatus for combining audio, video, and private application(s) data for communication and, in particular, concerns a method and apparatus for inserting private application(s) data, such as text and graphics overlays and gaming data, for example, onto an MPEG (Motion Pictures Experts Group) or MPEG-2 transport stream including packets carrying MPEG encoded video and audio data. In particular, the present invention concerns a method and apparatus for inserting private application(s) data onto an MPEG or MPEG-2 transport stream such that:
(i) the private application(s) data is synchronized with packets of encoded video data on the MPEG or MPEG-2 transport stream; and/or PA1 (ii) the format of packets defining the transport stream can be dynamically adjusted based on the bandwidth required to communicate the private application(s) data. PA1 (a) encoding the audio data in accordance with an audio compression algorithm to produce encoded audio data; PA1 (b) encoding the video data in accordance with a video compression algorithm to produce encoded video data; PA1 (c) packetizing the encoded audio data to produce packets of encoded audio data, each of the packets of encoded audio data having a header portion and a payload portion, each header portion of the audio packets having a fixed area for accommodating private application data; PA1 (d) packetizing the encoded video data to produce packets of encoded video data, each of the packets of encoded video data having a header portion and a payload portion, each header portion of the video packets having a fixed area for accommodating private application data; PA1 (e) inserting the private application data into the fixed areas of the headers of the packets of encoded audio data and packets of encoded video data to produce stuffed audio and video packets; PA1 (f) multiplexing the stuffed audio and video packets to form a packet stream; and PA1 (g) transmitting the packet stream from the source location to the destination location. PA1 (a) extracting private application data from the stuffed audio and video packets of the transmitted packet stream; PA1 (b) demultiplexing the transmitted packet stream to separate the packets of encoded audio data from the packets of encoded video data; PA1 (c) decoding the payload portion of the packets of audio encoded data to form audio data; and PA1 (d) decoding the payload portion of the packets of encoded video data to form video data. The extracted private application data may then be provided to a private application process. PA1 (a) encoding the audio data in accordance with an audio compression algorithm to produce encoded audio data; PA1 (b) encoding the video data in accordance with a video compression algorithm to produce encoded video data; PA1 (c) packetizing the encoded audio data to produce packets of encoded audio data, each of the packets of encoded audio data having a header which includes a first packet identification number; PA1 (d) packetizing the encoded video data to produce packets of encoded video data, each of the packets of encoded video data having a header which includes a second packet identification number; PA1 (e) packetizing the private application data to produce packets of private application data, each of the packets of private application data having a header which includes a third packet identification number; PA1 (f) multiplexing the packets of private application data, encoded audio data, and encoded video data to produce a packet stream; and PA1 (g) transmitting the packet stream from the source location to the destination location. PA1 (h) demultiplexing the transmitted packet stream to separate the packets of encoded audio data, packets of encoded video data, and packets of private data; PA1 (i) decoding the payload data of the packets of encoded audio data to form audio data; and PA1 (j) decoding the payload data of the packets of encoded video data to form video data. The private application data located in the payload of the packets of private application data may then be provided to a private application process.
b. Related Art
The International Organisation for Standardisation (or the Organisation Internationale De Normalisation) (hereinafter referred to as "the ISO/IEC") has produced a standard for the coding of moving pictures and associated audio. This standard is set forth in four documents. The document ISO/IEC 13818-1 (systems) specifies the system coding of the specification. It defines a multiplexed structure for combining audio and video data and means of representing the timing information needed to replay synchronized sequences in real-time. The document ISO/IEC 13818-2 (video) specifies the coded representation of video data and the decoding process required to reconstruct pictures. The document ISO/IEC 13818-3 (audio) specifies the coded representation of audio data and the decoding process required to reconstruct the audio data. Lastly, the document ISO/IEC 13818-4 (conformance) specifies procedures for determining the characteristics of coded bitstreams and for testing compliance with the requirements set forth in the ISO/IEC documents 13818-1, 13818-2, and 13818-3. These four documents, hereinafter referred to, collectively, as "the MPEG-2 standard", are incorporated herein by reference.
A bit stream, multiplexed in accordance with the MPEG-2 standard, is either a "transport stream" or a "program stream". Both program and transport streams are constructed from "packetized elementary stream" (or PES) packets and packets containing other necessary information. A "packetized elementary stream" (or PES) packet is a data structure used to carry "elementary stream data". An "elementary stream" is a generic term for one of (a) coded video, (b) coded audio, or (c) other coded bit streams carried in a sequence of PES packets with one and only stream ID. Both program and transport streams support multiplexing of video and audio compressed streams from one program with a common time base.
FIG. 7 illustrates the packetizing of compressed video data 706 of a video sequence 702 into a stream of PES packets 708, and then, into a stream of transport stream packets 712. Specifically, a video sequence 702 includes various headers 704 and associated compressed video data 706. The video sequence 702 is parsed into variable length segments, each having an associated PES packet header 710 to form a PES packet stream 708. The PES packet stream 708 is then parsed into segments, each of which is provided with a transport stream header 714 to form a transport stream 712. Each transport stream packet of the transport stream 712 is 188 bytes in length.
Transport streams permit one or more programs with one or more independent time bases to be combined into a single stream. Transport streams are useful in instances where data storage and/or transport means are lossy or noisy. The rate of transport streams, and their constituent packetized elementary streams (PESs) may be fixed or variable. This rate is defined by values and locations of program clock reference (or PCR) fields within the transport stream.
Although the syntax of the transport stream is described in the MPEG-2 standard, the fields of the transport stream pertaining to the present invention will be described below with reference to FIG. 3 for the reader's convenience. As shown in FIG. 3, a transport packet stream 300 includes one or more 188 byte packets, each of the packets having a header 302 and an associated payload 304. Each header 302 includes an eight (8) bit synch byte field 306, a one (1) bit transport error indicator field 308, a one (1) bit payload unit start indicator field 310, a one (1) bit transport priority field 312, a thirteen (13) bit packet identifier (or PID) field 314, a two (2) bit transport scrambling control field 316, a two (2) bit adaptation field control field 318, a four (4) bit continuity counter field 320, and an adaptation field 322. Each of these fields is described in the MPEG-2 standard. However, for the reader's convenience, the fields particularly relevant to the present invention are described below.
First, the synch byte 306 has a value of "01000111" and identifies the start of a 188 byte packet. The PID field 314 indicates the type of data stored in the payload 304 of the 188 byte packet. Certain PID values are reserved. For example, PID values 0x00010 through 0x1FFE may be assigned as a Program map PID. The program map provides mappings between program numbers and the elementary streams that comprise them. The program map table is the complete collection of all program definitions for a transport stream. The program map shall be transmitted in packets, the PID values of which are privately selected (i.e., not specified by the ISO/IEC).
As shown in FIG. 3, the adaptation field 322 includes an eight (8) bit adaptation field length field 324, a one (1) bit discontinuity indicator field 326, a one (1) bit random access indicator field 328, a one (1) bit elementary stream priority indicator field (330), a five (5) bit flag field 332, optional fields 334 and stuffing bytes 336.
As is further shown in FIG. 3, the optional fields 334 include a 42 bit program reference clock (or PRC) field, 338, a 42 bit original program reference clock (or OPCR) field 340, an eight (8) bit splice countdown field 342, an eight (8) bit transport private data length field 344, a transport private data field 346, an eight (8) bit adaptation field extension length field 348, a three (3) bit flag field 350, and optional fields 352. Each of these fields is described in the MPEG-2 standard. However, for the reader's convenience, the fields particularly relevant to the present invention are described below.
First, the 42 bit program clock reference (or PCR) field 338 and the 42 bit original program clock reference (or OPCR) field 340 are time stamps in the transport stream from which timing of a downstream decoder is derived. The eight (8) bit transport private data length field 344 describes the length (in bytes) of the adjacent transport private data field 346. The contents of the transport private data field 346 are privately determined (i.e., not specified by the ISO/IEC).
As is also shown in FIG. 3, the optional fields 352 include a one (1) bit legal time window valid flag field 354, a fifteen (15) bit legal time window offset field 356, two (2) undefined bits, a 22 bit piecewise rate field 358, a four (4) bit splice type field 366, and a 33 bit decoding time stamp next access unit field 362. A description of these fields is not necessary for understanding the present invention.
The payloads 304 of one or more transport stream packets may carry "packetized elementary stream" (or PES) packets 800. To reiterate, a "packetized elementary stream" (or PES) packet 800 is a data structure used to carry "elementary stream data" and an "elementary stream" is a generic term for one of (a) coded video, (b) coded audio, or (c) other coded bit streams carried in a sequence of PES packets with one and only stream ID.
FIG. 8 is a diagram which illustrates the syntax of a PES packet 800. As FIG. 8 shows, a PES packet 800 includes the PES packet header 710 comprising a 24 bit start code prefix field 802, an eight (8) bit stream identifier field 804, a sixteen (16) bit PES packet length field 806, an optional PES header 808; and the payload or data section 706. Each of these fields is described in the MPEG-2 standard.
The MPEG-2 standard focuses on the encoding and transport of video and audio data. In general, the MPEG-2 standard uses compression algorithms such that video and audio data may be more efficiently stored and communicated. FIG. 4 is a block schematic showing the steps of encoding, communicating (from location 440 to location 450), and decoding video and audio data in accordance with the MPEG-2 standard.
As shown in FIG. 4, at a first location 440, video data is provided to a video encoder 402 which encodes the video data in accordance with the MPEG-2 standard (specified in the document ISO/IEC 13818-2 (video), which is incorporated herein by reference). The video encoder 402 provides encoded video 404 to a packetizer 406 which packetizes the encoded video 404. The packetized encoded video 408 provided by the packetizer 406 is then provided to a first input of at least one of a program stream multiplexer 410 and a transport stream multiplexer 412. For the purposes of understanding the present invention, it can be assumed that program streams are not generated.
Similarly, at the first location 440, audio data is provided to an audio encoder 414 which encodes the audio data in accordance with the MPEG-2 standard (specified in the document ISO/IEC 13818-3 (audio), which is incorporated herein by reference). The audio encoder 414 provides encoded audio 416 to a packetizer 418 which packetizes the encoded audio 416. The packetized encoded audio 420 provided by the packetizer 418 is then provided to a second input of at least one of the program stream multiplexer 410 and the transport stream multiplexer 412.
The transport stream multiplexer 412 multiplexes the encoded audio and video packets and transmits the resulting multiplexed stream to a second location 450 via communications link 422. At the second location 450, on a remote end of the communications link 422, a transport stream demultiplexer 424 receives the multiplexed transport stream. Based on the packet identification (or PID) number 314 of a particular packet, the transport stream demultiplexer 424 separates the encoded audio and video packets and provides the video packets to a video decoder 430 via link 428 and the audio packets to an audio decoder 434 via link 432. The transport stream demultiplexer 424 also provides timing information to a clock control unit 426. The clock control unit 426 provides timing outputs to the both the video decoder 430 and the audio decoder 434 based on the timing information provided by the transport stream demultiplexer 424 (e.g., based on the values of the PCR fields 338). The video decoder 430 provides video data which corresponds to the video data originally provided to the video encoder 402. Similarly, the audio decoder 434 provides audio data which corresponds to the audio data originally provided to the audio encoder 414.
In some instances, communicating private application data from the first location 440, at the near end of link 422, to the second location 450, at the far end of the link 422, is desired. This private application data may, for example, include graphics and text screens which are to overlay the video data, gaming data associated with the video data, or pricing and ordering information related to the video data, for example. Some private application data will not need to be synchronized with the video data. For example, private application data corresponding to real-time stock ticker information including financial news and analysis can be overlaid over a television program without synchronization. On the other hand, some private application data must be synchronized with the video and audio data. For example, private application data corresponding to pricing and ordering information for a sequence of products to be overlaid over a home shopping program displaying a sequence of various products must be synchronized with the video of the products such that the pricing and ordering information corresponds to the appropriate product being displayed.
Known MPEG systems do not specify how to communicate such private application data from a first location 440 to a second location 450. Thus, a need exists for a method and device for communicating such private application data. Such a method or device should use the existing communication link 422, if possible, so that such data can be transmitted with the packetized MPEG video and audio data. Moreover, such a method or device should permit the private application data to be synchronized with the decoded MPEG video and audio data. Furthermore, such a method or device should conserve bandwidth and prevent buffers storing MPEG packets from underflowing (in which case the same frame of video may be played more than once) or overflowing (in which case packets are lost) Finally, such a method and device should permit the communication of more than one type of private application data.