The HEVC standard specification defines a set of coding tools that aims at obtaining a high coding efficiency, i.e., the best compromise between quality and size of the encoded video sequence. Frames of a video are generally segmented in several spatial portions and then embedded in network transport containers called NAL (Network Abstraction Layer). One advantage of the segmentation is to favour parallel processing of partitioned data. Several methods can be used for segmenting frames, for embedding encoded partitioned data into separate NAL units and for performing parallel processing. One effect of the parallel processing is that encoded NAL units corresponding to one frame may be generated in a particular order different than the normative order imposed by HEVC specification.
Classically, it is up to the device embedding the encoder to rearrange the encoded NAL units of the bitstream into the normative order. Nevertheless, this rearranging stage is useless if the decoder supports the decoding of NAL units that are not received according to the normative order (“out of order NAL units”), in particular for parallel decoding of the frame (NAL unit orders may be less constrained). Since the arranging stage of the NAL units introduces an additional latency, it is worth considering out of order bitstream to minimize the encoding (and also decoding) latency.
FIG. 1 shows the image coding structure used according to HEVC. The original video sequence is a succession of digital images represented by one or more matrices, the coefficients of which represent pixels.
Image 201 is divided into non-overlapping Coding Tree Units (CTUs) 202, generally blocks of size 64 pixels×64 pixels. Each CTU may in its turn be iteratively divided into smaller variable size Coding Units (CUs) 203 using a quadtree decomposition. Coding units are the elementary coding elements and comprise two sub units called “Prediction Unit” (PU) and “Transform Units” (TU) of maximum size equal to the CU's size. A Prediction Unit corresponds to the partition of the CU for prediction of pixel values. Each CU can be further partitioned into a maximum of 2 symmetric rectangular Partition Units or in asymmetric partitions. Transform units are used to represent the elementary units that are spatially transformed with DCT. A CU can be partitioned in TU based on a quadtree representation.
HEVC standard provides different types of image segmentation mechanisms: slice segments and tiles.
The image 201 is divided into slice segments 208. A slice segment is a part of the image or the entire image. Each slice segment contains an integer number of Coding Tree Units (CTUs).
According to HEVC, the slice segments (208) can be an independent or a dependent slice segment. Each slice segment is embedded into one NAL unit. The value of one flag specified in the slice segment header determines whether the slice segment is independent or dependent. The difference between the two types of slice segments lies within the fact that data in the independent slice segment header defines all parameters necessary to decode encoded CUs of the slice segment. Dependent slice segments have a reduced header and the first preceding independent slice segment is needed to infer parameters not available in the header of the dependent slice segment.
A set of one independent slice segment and consecutive dependent slice segments (if any available) represent a slice in HEVC. Two neighbour coding units that belong to the same slice can be predicted with each other. On the contrary, if the CUs are not in the same slice, all prediction mechanisms are broken by the slice boundary. Consequently, one coding unit can use data of another CU that is coded in another slice segment if the two slice segments belong to the same slice.
For instance, the frame 206 has been divided into three slice segments. The two first slice segments form one slice and the last slice segment another slice. Slice segments #1 and #3 are also independent slice segments and slice segment #2 is a dependent slice segment. Coding units of slice segment #3 are coded independently from any of the CUs in the slice segment #2 since they are separated by a slice boundary 207. In order to decode the data of dependent slice segment #2, some information in the independent slice segment #1 must be retrieved to infer the encoding parameter of the dependent slice segment #2. In addition, information can be predicted from CUs of the slice segment #1 in order to better compress coding units of the slice segment #2.
According to HEVC, frames may be partitioned in tiles in order to split the video frames into independently coded rectangular areas as illustrated by frame 204. Each tile contains an integer number of CTUs. Inside the tiles, CTUs are scanned in raster scan order. Similarly to slice boundaries, tiles break all prediction mechanisms at their boundaries. HEVC tiles make parallel encoding and decoding of the frames possible. According to HEVC, tiles are defined in the Picture Parameter Set (PPS) NAL unit which is used to initialize the decoding process. PPS NAL unit includes syntax elements that specify the number of tile rows and the number of tile columns in the picture and their associated sizes. The tile locations (offset in bits) in one slice segment are identified with syntax elements available at the end of the slice segment header.
Tiles and slice segments may be jointly used but with some restrictions. One or both of the following conditions must be satisfied:                all CTUs of one slice (or slice segment) belong to the same tile.        all CTUs of one tile belong to the same slice (or slice segment).        
Thus, one slice (or slice segment) may contain several entire tiles or be only a sub part of single tile. Also, a tile may contain several entire slices (or slice segments) or only be a subpart of a single slice (or slice segment).
HEVC also provides a wavefront parallel processing tool which consists in processing each CTU line in parallel with two CTUs delay. For example, the first CTU line of frame 209 is encoded in a dedicated thread. Once the second CTU of this line is processed, another thread starts processing the first CTU of the second line. The encoding context of the top right CTU (i.e. the second CTU of the first CTU line) is used. All CTUs processed in one thread form a wavefront substream and may be embedded in one dependent slice segment NAL unit. On the decoder side, the dependent slice segments are processed also in parallel with two CTU delay between each wavefront substream.
H.264/AVC provides a feature for processing NAL units of access units that are not in the normative order. This feature is associated with the Arbitrary Slice Order (ASO) coding tool that makes it possible to process any slice of one frame in any order. ASO is available only in specific profiles of the standards (Baseline and Extended). Depending on the value of the “profile-idc” SDP parameter (which specifies the H.264/AVC profile of the video stream), any NAL units order may be employed in one access unit.
The decoder reordering capabilities are determined by the supported profile. H.264/AVC provides only slice NAL unit which are close to independent slice segments.
The RTP payload for SVC (RFC 6190) specifies the encapsulation process for a scalable video stream that may be sent for instance with several parallel RTP sessions. Each layer of the video sequence is transmitted in one RTP session. The RFC introduces a packetization mode referred to as non-interleaved timestamp (NI-T) mode according to which:
1. In each RTP session the NAL units are transmitted in the decoding order as specified in the H.264/AVC specification (normative order).
2. Reordering between RTP sessions is based on the timestamps of the RTP packets. NAL units of the different RTP sessions within the same access unit share the same RTP timestamp. The packetizer adds empty NAL units for the RTP session that do not specify NAL units for a specific RTP timestamp defined in another RTP session.
3. The reordering of the different RTP session is based on the session dependency order starting from the lowest RTP session.
4. The NAL units of one access unit are then ordered in the decoding order based on the NAL unit types of the different NAL units.
RFC 6190 specify an FMTP parameter (“sprop-no-NAL-reordering-required”) that makes it possible to skip the last reordering stage #4.
When several slices are used for one coding picture in an access unit, each NAL unit corresponding to the slice is transmitted in the decoding order within the RTP session. No reordering is required for these NAL units. The “sprop-no-NAL-reordering-required” parameter makes it possible to avoid some reordering. However, it gives no information concerning the reordering process for NAL units corresponding to subparts of a coding picture.
When several video coding layer (VCL) NAL units are generated for each picture, which is typically the case when partitioning tools are used for parallel processing, the partitions of the picture (i.e. slices segments which contain slice data, tiles data or wavefront substream data) are encoded at different processing times depending on the picture's content.
For instance, a tile containing very complex or high frequency textured pixels consumes more processing time than a tile composed of low frequency textured pixels. As a consequence, the VCL NAL units composing the picture may be generated in an order different from the normative order provided in the HEVC specification. For generating a conformant bitstream, these NAL units should be reordered before being delivered to the decoder. Non VCL NAL units may be available for each access unit. In what follows, only the order of VCL NAL units is considered.
This increases the latency of the streaming and requires processing resources on the sender device embedding the encoder or on the receiver device embedding the decoder.
Thus, there is a need for making delivery of VCL NAL units to a decoder in an order different from the normative order imposed by the standard.
The Invention lies within this context.