Digital television offers viewers high quality video entertainment with features such as TV programming, pay-per-view (PPV), video-on-demand (VoD), games, as well as Internet access, collectively referred to as ‘multimedia entertainment content’, or ‘content’. Use of communication networks for distribution of content continues to gain popularity, fuelled by the decreasing cost of equipment and bandwidth to the home, and emergence of interactive personalized services.
Because multimedia files tend to be large, the content is currently packaged in information streams, which are transmitted to the user via a broadband communication network. Each individual image in a sequence of images on film or video is referred to as a frame. Sequences of frames often contain pixels (picture elements) that are very similar or identical, such as green grass, blue sky, etc. Compression and motion compensation protocols, of which MPEG is widely spread today, are typically used to minimize these redundant pixels between adjacent frames for improving the use of transmission bandwidth. The video and audio specifications for compression/decompression (encoding/decoding) protocols give the syntax and semantics of encoded streams necessary for communicating compressed digital content as well as for storing and playing such video on media in a standard format.
To compress (encode) a stream carrying multimedia entertainment content, discrete samples in a stream are transformed into a bit-stream of tokens, which is much smaller than the corresponding initial stream, since essentially only data that has changed from frame to frame is captured in the compressed stream instead of capturing all information in the initial stream. The signal is broken into convenient sized data blocks, and header information is added to each data block; the header identifies the start of the packets and must include time-stamps because packetizing disrupts the time axis.
The multimedia encoding/decoding format tells the decoder how to inverse-represent the compacted stream back into data resembling the original stream of un-transformed data, so that the data may be heard and viewed in its normal form. However, if the decoder (receiver) is not reset on channel change, it will display noise if channels are simply switched. Hence, the receiver needs to delay processing video packets from the new channel until a certain pointer (also referred as key data or milestone) showing the start of a data block is received.
It is to be noted that MPEG (Moving Picture Experts Group) and specifically MPEG2 transport stream is used within this document to describe and illustrate the concepts at the base of the invention, but the invention is applicable to any multimedia stream format that incorporates milestones within the stream that can be identified and used to synchronize stream startup.
A MPEG transport stream used for transmission and digital broadcasting includes one or more video and audio packetized elementary streams (PES), each PES including an independent timebase for clock recovery and audio/video synchronization information. The transport stream also includes program guide and system information (PSI), conditional access information for enabling selective access to each program and its elements, and data services which may be associated with the programs. It is formed of short fixed-size packets, each carrying a packet identifier (PID); packets in the same elementary stream all have the same PID, so that the decoder can select the elementary streams it wants and reject the remainder.
The program specific information keeps track of the different programs in an MPEG transport stream and in the elementary streams in each program. PSI includes a Program Association Table (PAT), Program Map Tables (PMT) and Conditional Access Tables (CAT). The PAT (Program Association Table) includes data that the decoder uses to determine which programs exist in the respective transport stream. PAT points to a number of PMTs (one per program), which, in turn points to the video, audio, and data content of a respective program carried by the stream. A CAT is used for a scrambled stream. A PID of ‘0’ indicates that the packet contains a PAT PID. A stream may also contain NULL packets, which carry no data but are necessary to maintain a constant bit rate with a variable payload. NULL packets always have a PID of 8191 (all 1's).
The most popular MPEG protocols used today are MPEG1 described in ISO/IEC 11172 and MPEG 2 described in ISO/IEC 13818. In MPEG2 video compression, each picture is first compressed (intra-frame compression), and then sequentially presented pictures are compressed together (inter-frame compression). In inter-frame compression, only the differences between a frame and frames it depends on are included in the compressed frame. As a result, decoding of a frame depends on the decoding of previously viewed frames and in some cases on the decoding of subsequently viewed frames. In order to minimize decoding problems, especially errors that may propagate from an erroneous decoding of one frame to cause the erroneous decoding of dependent frames, only a relatively small group of pictures (GOP) are compressed together (e.g. 9 pictures).
The pictures of each GOP are encoded together independently from the frames of any preceding GOPs and can thus be independently decoded and any errors can not propagate from group to group. The first frame in a GOP is known as an I-frame (intra-frame) which is an encoded, independently compressed picture, whose decoding can be performed independently of any other frame. The more I-frames are contained in a stream, the better quality the video will be; however, I-frames contain the most amount of bits and therefore take up more space on the storage medium.
In general, a client (receiver, decoder, set-top box, or player) has the option to select for viewing one of a plurality of channels, which are broadcast from a head-end or streamed from a server with pre-stored content files. A channel change is performed in response to a request from a certain client to the server; in response, the server provides the client with the new address from where to receive the new channel. The receiver leaves the currently viewed channel and joins the new channel. Channel change time in the IP-based audio/video transport systems creates significant delays in the consumer's TV viewing/surfing experience. Channel change speed is adversely impacted by a plurality of factors, such as key press propagation (from the channel selector to the server), IGMP leave/join operations latency, packet buffering and propagation, PAT/PMT latency, I-frame latency and frame decode and presentation times, to name a few.
Currently, a subscriber terminal joins a channel at a random point in the data stream and has to wait for key data structures (milestones) it needs to display fully synchronized audio and video. For a MPEG2 stream, the I-frame is one of these key data structures, PAT/PMT are others. A clean channel change requires the decoding to start on an I-frame (full frame). I-frames are only sent once or twice per second and even less frequently in contents encoded at lower bit rate, thus introducing a latency ranging from several hundreds of milliseconds to a couple of seconds. As this is an important delay, it has been an issue with DVB and ATSC (European and respectively North American standard for the streaming media broadcast systems) to date. However, channel change times less than one second are difficult to achieve today with the current technology. Attempts to reduce this server side delay are currently emerging. The present invention is directed to reducing the delays introduced by the I-frame latency.
For example, it has been proposed to connect a server at the edge of a broadband network with a view to provide clients in a certain geographical area with broadcast multimedia streams. The server is a stand-alone server, which receives streaming multimedia content from a content source in the broadband network. The server includes for each stream of multimedia content a buffer that manages and buffers multicast packets in the received stream. Once the server receives a request for a channel change, it instructs a sender for the currently streamed channel to stop sending that channel to the client, and instructs sender for the newly selected channel to first start bursting data from the respective buffer to the client as fast as possible. At some point, the system switches the subscriber terminal (receiver) over from the unicast stream (the burst) to a general multicast stream of the requested channel.
With this arrangement, the server must “talk” directly with the clients to request/terminate delivery of data, request a change of channel, negotiate missed blocks in the data, status reports, heartbeat, unicast/multicast transition, etc. The messaging may for example use Reliable Transport Protocol (RTP) which is able to identify each packet individually. In RTP, the server tells the client what the current packet is, and the client requests this data until it catches up with the current time, at which point it switches from the burst stream to the steady stream. As the frequency of milestone information needed to start the playout is deliberately kept low in order to reduce the BW, time and BW are wasted while the decoder waits to find the milestone information in the incoming stream.
Another disadvantage of this approach is that the client must be aware of the server, and is not able to change channels if the server is not accessible. Also, in the steady state (when a client views a certain channel) messaging is still used by the client to request and receive packets that are missing. As such, the client does not have any autonomy if the connection with the server is lost for whatever reason. This currently used technique also requires very careful planning for the network to be able to handle the data bursts when a terminal performs a channel change. This can be a serious problem particularly for HDTV (high definition TV) content, and especially with more then one terminal in the same house.
There is a need for a solution that significantly reduces channel change delays (channel zapping time).