Current video coders (MPEG, H264, etc.) use a block-wise representation of the video sequence. The images are split up into macro-blocks, each macro-block is itself split up into blocks and each block, or macro-block, is coded by intra-image or inter-image prediction. Thus, I images are coded by spatial prediction (intra prediction), P and B images are coded by temporal prediction with respect to other I, P or B images coded-decoded with the aid of a motion compensation. Moreover, for each block is coded a residual block corresponding to the original block minus a prediction. The coefficients of this block are quantized after an optional transformation, and then coded by an entropy coder.
Of interest here are the data substreams arising from the entropy coder and more particularly the substreams which may be decoded in parallel.
A document US2009168868A1 is known, entitled “Systems and apparatuses for performing cabac parallel encoding and decoding”, describing a scheme allowing a type of parallelization of the coding and/or of the decoding in a coder of “CABAC” type. According to one embodiment, an image is split up into slices and the “CABAC” coder is initialized at each start of slice, so as to optimize the initialization of the probabilities. This embodiment allows encoding of slices into independent substreams and therefore parallel decoding of these substreams by several decoders. However, as there is no prediction between the slices, this splitting technique is not effective in terms of compression.
Also known is the document RFC 3984 from the IETF (Internet Engineering Task Force) by S. Wenger, entitled “RTP Payload Format for H264 Video”, which establishes a distinction between a coding layer (VCL: Video Coding Layer) and an abstraction layer for the transport of the encoded data on a network (NAL). This layer, the Network Abstraction Layer (NAL), encapsulates the data encoded by the VCL layer of the H.264 coder in units called NALUs (Network Abstraction Layer Units) adapted to the transport of the coded data on the network and to the multiplexing of these data.
This document teaches that an encoded video sequence is represented by a sequence of NAL Units. The NALUs may be NALUs for data (VCL) or for signaling (SEI, SPS, PPS, etc.). When the images are split up into slices, the data substream arising from the encoding of a slice or a part of a slice is encapsulated in an NALU.
According to one embodiment, several NALUs are aggregated in the useful part (“payload” in English) of a transport packet of RTP type. In a particular mode of aggregation, information items representative of the sizes of the aggregated NALUs are inserted into the RTP packet obtained.
According to another embodiment, a slice may be split and inserted into different NALUs, each being transported in a distinct RTP packet.
The insertion of the size of the substreams concatenated in one and the same packet is necessary to allow a decoder to access the various substreams received and to have them decoded by different processors. However, the additional signaling induced generates a cost overhead in terms of bitrate, proportional to the number of substreams.
Moreover, the concatenation of the data substreams requires the processors of the decoder to wait until they have received several substreams before being able to begin the decoding. Now, certain applications propose to begin to decode a video stream as soon as a part of the stream is available at the decoder. In this case, the benefit of using processes working in parallel is lost since, because the decoder receives the substreams one after another, it begins to decode the first substream as soon as the latter is in part received with the aid of a first processor. In order that a second processor can begin to decode in parallel a second substream, this second processor must wait until the first substream has been received completely. The paralleling of the processors is then not optimal since the lag between the start of processing of the processes is too long.
There therefore exists a need for a new technique for encapsulating substreams in a data stream, which allows the parallel decoding of the substreams by a plurality of decoders while limiting the cost overhead in terms of bitrate due to the signaling.
There furthermore exists a need for a new technique for encapsulating substreams in a data stream which makes it possible to rapidly commence the parallel decoding of the coded data relating to the various substreams.