In the past few decades, advances in the related fields of video compression and video transmission systems have lead to the widespread availability of digital video programs transmitted over a variety of communication systems and networks. Most recently, new technologies have been developed that have allowed television programs to be transmitted as multicast digital bitstreams of multiplexed video and audio signals delivered to users or client subscribers over packet switched networks.
Digital television signals are typically transmitted over packet networks as MPEG-2 data streams. Each MPEG-2 single program transport stream normally comprises data for a single television programs channel. Each transport stream consists of a set of sub-streams, commonly known as elementary streams, which contain packets of audio, video, or other data information encapsulated in the MPEG-2 stream. Each of these elementary streams has a Packet Identifier (PID) that uniquely identifies that stream within the larger transport stream.
The standard format of a MPEG-2 transport packet is shown in FIG. 5. As can be seen, each transport packet is 188 bytes long and includes a 4-byte header that contains fields for packet synchronization and identification (i.e., a 13-bit PID). The adaptation field carries synchronization and timing information for the decoding and presentation process. The adaptation field may also provide various indicators for random access points of compressed bitstreams and for local program insertion. By way of example, FIG. 6 shows an expanded view of various optional data fields that may be included in the adaptation field. The data portion of the payload may comprise any multimedia data including compressed audio and video streams.
A multiplexer is typically utilized to combine the elementary streams at the studio source or “head-end” of the system to form the overall transport stream. During the multiplexing process additional data, known as service information, is encoded within the transport stream. This service information is contained in a set of database tables that describes the structure of the transport stream. Service information tables commonly found in a DVB transport stream include: the Program Map Table (PMT), which contains the PID for each of the channels associated with a particular program and tells the client receiver which stream contains the MPEG program clock reference for the service; the Network Information Table (NIT), which uniquely identifies the network that is transmitting the transport stream and describes some of the physical properties of the network (e.g., channel frequencies); the Program Association Table (PAT), which contains a complete list of all programs in the transport stream along with the PID for the PMT for program as well as the NIT for the transport stream; and the Conditional Access Table (CAT), which specifies the conditional assess or scrambling systems in use in the transport stream and includes information on how to decode them. Collectively, the PMT, NIT, PAT, and CAT are referred to as Program Specific Information (PSI). The PSI data provides the information that enables automatic configuration of the client receiver in order to demultiplex and decode the various streams of programs. A device and method for de-multiplexing a transport stream which is suitable for fast processing and transmission of transport stream packets of a TV signal is disclosed in U.S. Pat. No. 6,269,107.
One of the requirements for streaming media is that it be played at a constant rate. This means that for streaming media to be transmitted over packet networks strict timing requirements have to be met in order to achieve high-quality media play out. Packet networks, however, typically transmit data asynchronously. This may cause a problem known as network jitter.
Referring to FIG. 1, the problem of network jitter has been addressed for unicast streaming media in the prior art through the use of a de-jitter buffer 17 at the client or receiver 18. A server or proxy 12 transmits data to the receiver 160 through an Internet Protocol (IP) distribution network 13, edge router or switch 14, and a “last mile” network (e.g., DSL) network 15. (In the context of the present application, the terms “router” and “switch” are used synonymously and interchangeably.) The server includes a path for data received from a source, and transmitted to the IP distribution network 13. The de-jitter buffer 17 at receiver 18 first fills up to its fixed size, and then starts playing out. While playing out, the buffer 162 is emptied at the same rate as it is filled. Constrictions in bandwidth affect only the fullness of the buffer, not the play out, therefore overcoming network jitter. The larger the size of the buffer, the more the network jitter phenomenon is abated. However, for streaming applications like audio, delay is introduced by the de-jitter buffer at the receiver end. Moreover, the solution of FIG. 1 does not address startup delays for multicast streaming, or video specific aspects.
IP multicasting is defined as the transmission of an IP datagram (i.e., a data packet formatted according to the Internet protocol) to a “host group”, which is a set of zero or more hosts identified by a single IP destination address. A multicast datagram is asymmetrically delivered to all members of its destination host group. The Internet Group Management Protocol (IGMP) is used between IP hosts and their immediate neighbor multicast agents to support the creation of transient groups, the addition and deletion of members of a group, and the periodic confirmation of group membership. Multicast data streams are typical sent using the User Datagram Protocol (UDP), which is implemented in the transport layer and provides a connectionless datagram service for the delivery of packets.
In video streaming applications, in addition to de-jitter buffer delays, there are delays associated with the acquisition of the PSI data, decoder buffer delays (at the receiver end), and delay associated with the acquisition of an Inter-frame of video data. This latter delay is due to the fact that MPEG video streams comprise different types of frames that do not include all of the data to be displayed at any given time. For instance, Inter-frames, or I-frames, are the only type of frame that is not coded with reference to any other frame; P-frames are coded predicatively from a previous I-frame or P-frame; B-frames are coded predicatively from I-frames and P-frames. In order to be properly decoded, a B-frame associated with a group of pictures (“GOPs”) may need to reference the I-frame of a next GOP. (It should be understood that a GOP is an optional structure of an elementary stream. Also, in the context of the present application, the term “I-frame” is intended to broadly refer to an Inter-frame and its equivalents, e.g., an IDR frame in the case of H.264.)
FIG. 2 shows a GOP that consists of 15 frames, wherein the first frame, an I-frame, is fully coded, while the following frames are predicted. Often times, what happens is that the de-jitter and decoder buffers of the receiver end up discarding large amounts of data until it receives all of the necessary information for acquisition, decoding, and displaying a video program.
All of the aforementioned delay factors can add up to a significant startup delay when a user tunes to a live multicast television stream transmitted over a packet network. For example, when a viewer changes a channel, an IGMP LEAVE request is sent by the client receiver to stop receiving the current channel, followed by an IGMP JOIN request to start receiving the new channel. When the new data starts arriving, the client receiver typically tosses out this data until a new I-frame is received. At this point, the de-jitter buffer starts loading until it is sufficiently full, at which time this data is transferred to the receiver's decoder. All of these real-time startup delays considerably lower the viewer's quality of viewing experience during channel changes.
Thus, there remains an unsatisfied need for a solution to the problem of startup delay for multicast streaming of live video programs.
By way of further background, U.S. Pat. No. 6,771,657 teaches a method and apparatus by which MPEG-2 digital television programs may be extracted from a transport stream. U.S. Pat. No. 6,718,553 discloses a system and method for delivery of digital broadcast television programming from a centralized aggregation head-end to subscribers in multiple markets using an interconnected terrestrial fiber optic network. U.S. Pat. No. 6,505,169 teaches a method for adaptive insertion of programs in streaming multimedia content. Additionally, a method for splicing data packets into a pre-existing data stream that complies with the MPEG transmission standard is disclosed in U.S. Pat. No. 5,917,830. Finally, U.S. Pat. No. 6,044,081 teaches a hybrid communications system and multimedia system that allows private network signaling to be routed over a packet network.