Temporal and spatial redundancy can be exploited using predictions to make a compact representation of video and other types of media and multimedia possible. For instance, pixel prediction is an important part of video coding standards such as H.261, H.263, MPEG-4 and H.264. In H.264 there are three pixel prediction methods utilized, namely intra, inter and bi-prediction. Intra prediction provides a spatial prediction of the current pixels block from previously decoded pixels of the current frame. Inter prediction gives a temporal prediction of the current pixel block using a corresponding but displaced pixel block in a previously decoded frame. Bi-directional prediction gives a weighted average of two inter predictions. Thus, intra frames do not depend on any previous frame in the video stream, whereas inter frames, including such inter frames with bi-directional prediction, use motion compensation from one or more other reference frames in the video stream.
User terminals with media players can only start decoding and rendering the media data at intra frames. In order to enable tune-in into the video stream without too long delays, intra frames are typically sent periodically. However, intra frames are generally larger in terms of the number of bits as compared to inter frames, thereby significantly contributing to the overhead in the video stream.
Media frames and their frame-carrying data packets of a video stream are typically grouped together in the streams. For instance, in the case of systematic Forward Error Correction (FEC), the frame-carrying data packets are grouped together into different FEC blocks and sent along with repair information. Such a FEC block should begin, in decoding order, with an intra frame so that errors do no propagate between FEC blocks. This also avoids longer tune in times, i.e. the FEC tune-in and intra frame tune-in should be aligned.
FIG. 1 illustrates such a division of frames 16, 22-26, 32-34 into FEC blocks 10, 20, 30. In the drawing I2 22, I3 32 denote the initial intra frame of the FEC block 20, 30 number 2 and 3, respectively. Pij 16, 24, 26, 34 denotes inter frame number j of FEC block 10, 20, 30 number i. Each media frame 22-26 of a FEC block 20 has a respective timestamp 40 defining the rendering or play-out time schedule for the media in the FEC block 20.
Currently, the average tune-in time for traditional encoded sequence is 1.5× the FEC block size. Firstly, one must wait a whole FEC block to be able to perform FEC decoding. With a single intra frame per FEC block one needs to also wait, on average, an additional half FEC block to get the intra frame. This is because tuning in after the start of a FEC block prevents, due to the temporal predictive nature of the inter frames, the decoder and media player to start decoding and rendering the media data until a next intra frame of a following FEC block.
In the upper part of FIG. 1, a user terminal tunes in to the stream 1 between inter frames P23 and P24. This means that the user terminal will only receive the inter frames P24 to P27 of the current FEC block 20. The user terminal therefore has to await the reception of the intra frame I3 32 of the next FEC block 30 before the media play-out can be initiated, which is exemplified in the lower portion of FIG. 1. The user terminal also needs to wait for a whole FEC block before decoding of packets can be performed. This is because data packets later on in the FEC block are used to repair earlier packets and one does not want to pause each time a repairable error occurs. The figure represents this by indicating tune-in and indicating the play-out occurring following buffering of a whole FEC block. The received inter frames P24 to P27 will, though, be unusable for the user terminal as it has not correctly received the prior frames I2, P21 to P23 of the FEC block 20.
It is possible today with the current implementation techniques to interleave data packets. In such a case, the initial intra frame 12, 22 of the FEC block 10, 20 can be put towards the end of the FEC block 10, 20, which is illustrated in FIG. 2. Comparing FIG. 2 with FIG. 1, the order of media frames 12-16, 22-26, 32-36 inside the respective FEC blocks 10, 20, 30 have been interchanged.
Tuning-in at the same time point in the transmission order as in FIG. 1, the user terminal will now receive the inter frames P21 to P23 and the intra frame I2 of the current FEC block 20. Thus, part of the FEC block 20 is decodable.
This interleaving, however, moves part of the tune-in delay a small distance into the sequence. Thus, frames I2, P21, P22, P23 are played-out as if the tune-in was at the beginning of the FEC block 20, which is illustrated in the lower part of FIG. 2. However, as the next four frames P24 to P27 of the FEC block 20 are not received, the frame P23 will be displayed until, in this example, the 9th frame I3 32 from the next FEC block 30 can be played out. As a consequence, the media data of the frame P23 is displayed for a very long period of time, which becomes visibly unattractive to the user. This problem is further abrogated when tuning in towards the end of a FEC block implying that the delay in the continuation of the rendering can be very large, up to the length of a FEC block.