The transport protocol used in such a network may for example be the Real-time Transport Protocol (RTP), well known to a person skilled in the art.
An unreliable network is considered, i.e. one that guarantees neither stability of the transmission conditions, nor the reliable transmission of packets. In other words, loss of packets may occur.
The present invention seeks to protect data against errors, including the loss of packets. It lies within the non-limiting scope where the data are video data that comply with the H.264/AVC standard or SVC (Scalable Video Coding) standard. Nevertheless, the adaptation of the invention to other video coding standards such as MPEG-4 part 2, H.263 and other scalable video coding standards presents no difficulties.
A few basic notions concerning the H.264 and SVC standards now follow.
The SVC standard constitutes an extension of the H.264 standard in that it introduces scalable coding or scalability properties.
The H.264 standard constitutes the state of the art in terms of video compression. It enables compression efficiency to be considerably increased compared to MPEG-2, MPEG-4 part 2 and H.263. In terms of technology, the H.264 standard is based on a traditional hybrid predictive coding pattern using a combination of spatial transformation and motion compensation/estimation, this general pattern being optimized to obtained better compression efficiency.
Several coding modes are available for coding a macroblock of pixels (henceforth referred to as MB), which include:                INTRA or I: INTRA coding is a spatial coding. The MB is independent of any other information coming from other images.        INTER or P: INTER MBs are the result of a temporal prediction on the basis of a previously coded image. This type of MB is coded in the form of a motion vector and residual data.        Bidirectional or Bidir or B: B MBs use the same principle as P MBs in that they also result from a temporal prediction. However, for B MBs, two reference regions are extracted from a previous image and from a subsequent image respectively, which are used for the prediction. This type of MB is therefore coded in the form of two motion vectors and residual data.        P_SKIP: for P_SKIP MBs, no other data is coded in the binary stream except the MB type. The final motion vector of a P_SKIP MB is deduced from the surrounding coded MBs.        B_SKIP, B_direct—16×16 and B_direct—8×8: no motion information is transmitted with such MBs. The information is deduced from the surrounding MBs which were coded previously or from the motion vector of the MB located in the same position in the next reference image. Furthermore, no residual data are coded with B_SKIP MBs.        
During the coding process, an MB mode selection mechanism is implemented. When the INTRA mode is selected for an MB, the INTRA prediction consists in predicting this MB in the field of pixels by using the pixels located along the outer boundary of this MB. A DCT is then applied to the difference between the prediction and the original and this difference is coded.
Such predictive coding is also applied to the motion vector. In fact, a motion vector in a temporally-predicted MB is coded in a predictive manner by using motion vectors of surrounding MBs. Consequently, the absolute value of a motion vector is not coded but replaced by MvdI0 and MvdI1 which represent the difference between a vector component to be used and its prediction.
Each MB is associated with a parameter called Coded_block_pattern specifying which of the 6 8×8 blocks (luminance and chrominance) may contain non-null transformation coefficient levels.
The SVC standard has added to H.264 possibilities of adaptation in the form of scalable coding or scalability properties. Three scalability axes have been defined in SVC: spatial, temporal and quality.
Temporal scalability allows the temporal resolution of a sequence to be modified by suppressing certain images, such suppression taking dependencies between images into account.
Spatial scalability consists in inserting several resolutions in a video stream, the lowest resolution being used to predict the highest resolutions. A particular feature of SVC is that it allows any ratio of resolutions between two successive spatial resolutions whereas a ratio of only 2 was allowed by previous scalable codecs.
Quality scalability, also know by the name of SNR scalability, takes the form of Coarse Grain Scalability (CGS), Medium Grain Scalability (MGS) and Fine Grain Scalability (FGS).
CGS SNR coding is achieved by using the same concepts as those of spatial scalability. The only difference is that for CGS scalability, the oversampling operations of the inter-layer prediction are omitted.
FGS results in a binary stream that can be truncated at any point, without preventing the decoding process. This characteristic is of particular interest for adapting the binary stream of the video in a precise manner.
MGS scalability has been defined as intermediate between CGS and FGS. It offers finer decoding points in the binary stream than CGS, but does not allow truncation at any point like FGS. Many coding and network experts believe that MGS offers sufficient granularity for practical network conditions.
Scalability is based on an Inter-Layer Prediction (ILP). Several coding modes have been specially designed for inter-layer prediction in SVC:                IntraBL or I_BL: this mode allows a MB to be predicted in an enhancement layer depending on the MB located in the same position in the lower layer. The MB of the lower layer is interpolated in order to re-scale it to the resolution of the enhancement layer. The difference between the MB to be coded and the interpolated MB located in the same position is then coded.        Prediction of the motion vectors: in this mode, the MB of the enhancement layer is deemed to have a motion close to that of the MB of the lower layer. In this case, at the very most a slight variation of the motion vector of the MB of the lower layer is coded in the enhancement layer.        Residual prediction: in this mode, an MB in an enhancement layer that has a motion close to the MB located in the same position in the lower layer is deemed also to have similar residual data. Consequently, the difference between the residual data is coded.        
The MBs of an enhancement layer using data from a lower layer for their coding are identified by a flag called base_mode_flag. If it has a value 1, this flag indicates that the MB prediction mode as well as the corresponding motion data are deduced from the base layer. A second flag called residual_prediction_flag indicates that the residual data of the MB in question are predicted by using the data of the lower layer.
It will be noted that with H.264 and SVC, the transmission of a video on a network is facilitated, with the notion of the Network Abstraction Layer (NAL). A NAL is a sort of container that provides in its header a brief description of the data transported on the network.
A great deal of research has been carried out on error control in the field of video transmission on unreliable networks.
One solution, called INTRA Refresh, consists in coding the important MBs in INTRA mode. Given that INTRA MBs are not subject to error propagation, this allows the quality of the video in difficult conditions to be improved.
However, the main drawback of this solution is linked to the cost of coding the INTRA MB. In fact, the gain in robustness involves an increase in the video bandwidth. Furthermore, this solution is not really adapted to pre-coded videos.
Another solution lies in limiting the number of reference images. For example, in a Group of Pictures (GOP), only one image is used as a reference for temporal prediction for all of the other images in the GOP. Better protection is therefore provided for this image than for the others.
Nevertheless, knowing that just one image is used as a reference in a GOP for temporal prediction, the compression performance of the coder decreases, because it benefits less from temporal correlations.
The partitioning of data is another notion that allows better protection of important data. An example of using data partitioning is proposed in patent document U.S. Pat. No. 7,010,037.
In that document, a scalable coder or a transcoder uses data partitioning to create two scalable layers from a raw or pre-coded video. It uses the conventional functionality of data partitioning as defined in MPEG-4 Part 2, which consists in separating into two binary streams the motion vectors, the low-frequency DCT coefficients (DC) and the AC coefficients that are the most important among the other AC coefficients.
The idea is to optimize the partitioning of the coefficients for each block without increasing the cost due to coding the partitioning point.
The process described in U.S. Pat. No. 7,010,037 has various drawbacks. In particular, it requires the use of a non-standard decoder, because additional information is required by the decoder to identify the way in which the coefficients have been partitioned. Furthermore, as partitioning depends on the bandwidth available, new partitioning must be defined if network conditions change.