In the field of video transmission over a packet network, the network is non reliable since it doesn't ensure stable conditions nor the reliable transmission of packets, i.e. packet losses may occur.
The video data to be transmitted is supposed to be pre-encoded in a scalable video stream according to a video coding format of a video coding standard. In that context, the video data to be transmitted is represented by a set of layers and an adaptation of the video data to the transmission constraints consists in the selection of a subset of layers to be transmitted. This selection process, typically managed by a decision engine at the server side, is improved to better take into account the variation of the network conditions.
SVC is a new video standard extending H.264 with scalability features. H.264 constitutes the state of the art in terms of video compression. This standard developed by JVT (“Joint Video Team”) significantly enhances the compression efficiency as compared to MPEG-2, MPEG-4 part 2 and H.263. In terms of technology, H.264 is always based on the traditional hybrid scheme using a combination of spatial transform and motion estimation/compensation. However, this general scheme has been optimized to obtain better compression efficiency. Similarly to H.264, SVC processes data by macroblocks, which may be gathered into slices which are encoded separately.
SVC added adaptation capabilities to H.264 in the form of scalability features. Three scalability axes have been defined in SVC, spatial, temporal and quality scalabilities. Temporal scalability allows modifying the temporal resolution of a sequence by removing some frames, the removal taking into account the frame dependencies. Spatial scalability consists in inserting several resolutions in a video stream, the lowest resolution being used for the prediction of the highest resolutions. Quality scalability also known as SNR scalability, takes the form of Coarse Grain Scalability (CGS), Medium Grain Scalability (MGS) and Fine Grain Scalability (FGS).
Many works have focused in the past on scalable video transmission over a non reliable channel. When the video is pre-encoded and stored on the server side, one key point is the selection of the most appropriate layers based on the network conditions. Generally, this process is performed by a decision engine and consists in a rate distortion optimization under a rate constraint. Several sets of layers can be potentially selected by the decision engine. Some of these sets are optimal in terms of quality under the given rate constraint but are sensitive to losses. Others are less sensitive to losses, but do not provide the optimal quality.