As described in the following, a preferred embodiment of the present invention uses a H.264 data partition codec to encode video data into substreams. However, it will be appreciated that other standards can be used to generate substreams when encoding a video. In principle, any coding standard can be used. A brief overview of H.264 data partition coding is now given.
A video codec is designed to compress and uncompress digital video in order to reduce the amount of bandwidth required to transmit and store the video. H.264 is a video codec standard which can be used in various forms. H.264 can be used to generate a single layer of encoded video in which the video data is in one stream. Alternatively, H.264 Scalable Video Coding (SVC) has recently been developed to offer layered encoding of data. Compared with the single layered H.264 codec, the layered codec requires a 10% increase in bandwidth for the same fidelity, as measured by the peak signal-to-noise-ratio (PSNR). A further form is H.264 data partitioned video, which is part of the H.264 standards extended profile. H.264 data partitioned video takes no extra storage space compared to the non-partitioned case, as data partitioning is simply a reorganisation of the encoded data and does not result from the compression process itself. The present example uses a H.264 data partition codec to encode data into substreams.
To encode a video sequence using H.264 data partitioning it is first split down into a number of raw frames which are arranged together as a group of pictures (GOP). The encoded GOP is called a coded video sequence in the language of H264. Then each frame is compressed, using H.264, into one or more slices. A slice is a spatially distinct region of a picture that is encoded separately from any other region in the same picture. During the encoding process three different encoded slice types are produced, termed intra (I-slice), predicted (or inter) (P-slice) and bipredicted (B-slice).
In the present example, only one slice per frame is used. Thus, an I-frame is a frame consisting of a single I-slice, a P-frame is a frame consisting of a single P-slice, and a B-frame is a frame consisting of a single B-slice. However, in principle more than one slice per frame can be used, i.e. wherein an I-frame is a frame consisting of only I-slices, a P-frame is a frame consisting of only I-slices and P-slices, and a B-frame is a frame consisting of only I-slices, P-slices and B-slices.
Intra coded frames (or slices) are pictures coded without reference to any pictures except themselves. This means that it only requires the successful reception of its own packets in order to reconstruct the entire frame. Instantaneous Decoding Refresh (IDR) pictures, also called key frames, contain only intra coded slices. Every GOP starts with an IDR picture.
With the predicted and bipredicted frames (or slices), repetition is removed from the video stream through the use of prediction. These therefore require the prior decoding of some other frame or frames in order to be decoded. A predicted frame holds only the changes in the image from the previous frame, so saving space. Bipredicted frames use two frames as sources for prediction, e.g. by using differences between the current frame and both the preceding and following frames to specify its content, which further increase the coding efficiency. In both cases, for a frame to be fully reconstructable both its own packets and those making up the frames from prediction need to be received by the client.
When H.264 video is encoded without Data Partitioning, each slice of a frame is encoded into one Network Abstraction Layer (NAL) unit. When H.264 encodes frames with data partitioning enabled, up to three NAL units per slice are produced. The three NAL units are named partition A, B and C respectively. Partition A contains the most important elements of the slice, including the slice header, macroblock types, quantisation parameters, prediction modes and the motion vectors. Partitions B and C contain the residual information for the intra and inter coded macroblocks, respectively. If partition A is lost then partitions B and C must be discarded. However, if partition A is received then the quality of the displayed frame will be improved when partition B or C is received as well.