Digital television and interactive media applications such as on-demand video services are becoming increasingly popular throughout the world. Due to the enormous amounts of digital data involved, efficient data compression schemes have been developed and standardised. The Moving Picture Experts Groups (MPEG) for example has defined several media compression standards including MPEG-1 and MPEG-2.
Media compression, also referred to as coding herein, aims at removing redundant information included in a sequence of pictures that form a particular media stream. Generally, an encoder at a transmitter site removes the redundancy prior to transmission, and a decoder at a receiver site re-inserts the redundancy prior to play out. The encoder exploits the fact that the individual pixel values of a sequence of digital pictures are not independent, but are correlated with their neighbours both within the same picture (spatial redundancy) and across a picture sequence (temporal redundancy). Temporal redundancy permits a prediction of the next picture from a previous “reference” picture.
The MPEG standards define various different coding modes for translating individual pictures into coded frames exploiting temporal redundancy for data compression purposes. The different coding modes give rise to different frame types.
“Intra” frames (I-frames) have been coded independently, i.e. without any reference to other frames. Moderate compression is achieved by reducing spatial redundancy, but temporal redundancy is not considered. I-frames are typically used periodically to provide access points in a frame stream where decoding can begin.
“Predictive” frames (P-frames) are dependent frames that can use a previous I- or P-frame for motion compensation, and that can be used themselves as a reference for further prediction. By reducing both spatial and temporal redundancy, P-frames offer increased compression compared to I-frames.
Finally, the MPEG standards define “bidirectionally-predictive” frames (B-frames) that can use the previous and next I- or P-frames for motion compensation.
Compression is highest for B-frames, and least for I-frames. However, in contrast to I-frames, P-frames and B-frames can not be decoded independently. That is, for decoding P-frames and B-frames supplemental information with regard to temporal redundancy is required. This supplemental information is typically included in a neighbouring frame, and eventually an I-frame is required as a starting point the decoding operation.
Once coded, the individual frames constituting a media stream can be delivered either via a point-to-point (PTP) transmission or via a point-to-multipoint (PTM) transmission. Existing mobile TV solutions deliver video streams over PTP unicast bearers. With the Multimedia Broadcast Multicast Service (MBMS), Digital Video Broadcast-Handheld (DVB-H) and similar technologies, it will soon become possible to also deliver media streams over PTM bearers (i.e., multicast or broadcast bearers).
The advantage of unicast delivery is the fact that network resources are only allocated as long as there are users requesting a particular media stream. The amount of consumed network resources is determined by the number of concurrent users, but not by the number of different media streams (also called media channels). In the broadcast case, on the other hand, the amount of consumed resources depends on the number of media channels, but is independent from the number of users listening to the media channels. Accordingly, a broadcast service can only deliver a limited number of media channels. Multicast transmission in many aspects behaves similar to broadcast transmission.
There is typically a certain latency when switching to an active channel (i.e. to an on-going stream of media frames) that is delivered by a server device over a multicast or broadcast bearer. The latency exists between the point in time when channel switching is requested by a user and the point in time when the new channel is played out by a client device operated by the user. Specifically, it takes approximately one second for the client device to tune into the new channel. The client device will then have to wait for a reference frame that can be decoded independently (e.g. an I-frame). The average time until an independent frame is received depends on the average interval between two independent frames. Conventional techniques use intervals of about 2 to 3 seconds (although longer intervals would be preferred since longer intervals improve coding efficiency). Once an independent frame has been received, the client device buffers the independent frame and all subsequent frames for about two seconds before decoding and play out can start. The total latency in this example amounts to approximately 3 to 6 seconds and varies depending on the time the client device has to wait for an independent frame. As this latency is undesirable, attempts have been made to reduce the waiting time between tuning into a new channel and channel play out.
US 2005/0081244 A1 describes a PTM scenario in which multiple client devices can tune into various media channels provided over a multicast transmission by a server device. In response to a channel change request received from an individual client device, the server device automatically sends a previous independent frame for the requested channel to the client device in a unicast message. Once the independent frame is received, it is immediately decoded and displayed by the client device. The display is initially static (for up to 2 seconds) until the client device has received the first “regular” independent frame, and the following dependent frames, via the regular multicast transmission. Apparently, the static display, as short as it may be, is undesirable.
Accordingly, there is a need for a technique that permits an improved play out of media frames that are received via a PTM transmission.