Video applications are continuously moving towards higher resolution. A large quantity of video material is distributed in digital form over broadcast channels, digital networks and packaged media, with a continuous evolution towards higher quality and resolution (e.g. higher number of pixels per frame, higher frame rate, higher bit-depth or extended color gamut). This technology evolution puts higher pressure on the distribution networks that are already facing difficulties in bringing HDTV resolution and high data rates economically to the end user. Consequently, any further data rate increases will put additional pressure on the networks.
To handle this challenge, ITU-T and ISO/MPEG decided to launch in January 2010 a new video coding standard project, named High Efficiency Video Coding (HEVC).
The HEVC codec design is similar to that of previous so-called block-based hybrid transform codecs such as H.263, H.264, MPEG-1, MPEG-2, MPEG-4 or SVC. Video compression algorithms such as those standardized by standardization bodies ITU, ISO and SMPTE use the spatial and temporal redundancies of the images in order to generate data bit streams of reduced size compared with the video sequences. Such compression techniques render the transmission and/or storage of the video sequences more effective.
An original video sequence to be encoded or decoded generally comprises a succession of digital images as illustrated in FIG. 1.
FIG. 1 shows the coding structure used in HEVC. According to HEVC and one of its previous predecessors, the original video sequence 101 is a succession of digital images “images i”. As known per se, a digital image is represented by one or more matrices the coefficients of which represent pixels.
The images 102 are divided into slices 103. A slice is a part of the image or the entire image. In HEVC these slices are divided into non-overlapping Largest Coding Units (LCUs), also called Coding Tree Blocks (CTB) 104, generally blocks of size 64 pixels×64 pixels. Each CTB may in its turn be iteratively divided into smaller variable size Coding Units (CUs) 105 using a quadtree decomposition. Coding units are the elementary coding elements and are constituted of two sub units which are Prediction Unit (PU) and Transform Units (TU) of maximum size equal to the CU's size. A Prediction Unit corresponds to the partition of the CU for prediction of pixel values. Each CU can be further partitioned into a maximum of 4 Partition Units 106. Transform units are used to represent the elementary units that are spatially transformed with a Transform (which can be for instance the Direct Cosine Transform also known as DCT). A CU can be partitioned into TUs based on a quadtree representation (107).
Each slice is embedded in one Network Abstration Layer (NAL) unit. In addition, the coding parameters of the video sequence are stored in dedicated NAL units called parameter sets. In HEVC two kinds of parameter sets NAL units are employed: first, the Sequence Parameter Set (SPS) NAL unit that gathers all parameters that are unchanged during the whole video sequence, and second, Picture Parameter Sets (PPS) which code the different values that may change from one frame to another. HEVC includes also Adaptation Parameter Sets (APS) which contain parameters that may change from one slice to another.
Each image may be made up of one or more image components, also called color components or channels. The color components are sets of two-dimensional arrays of sample values, each entry of which represents the intensity of a color component such as a measure of luma brightness and chroma color deviations from neutral grayscale color toward blue or red (YUV) or as a measure of red, green, or blue light component intensity (RGB). A YUV model generally defines a color space in terms of one luma (Y) and two chrominance (UV) components. Generally Y stands for the luma component (the brightness) and U and V are the chrominance (color) or chroma components. A 4:2:0 YUV image, for example, is made up of one luma component plus two chroma components having a quarter of the spatial resolution (half width and half height) of the luma component.
The coding and decoding devices comprise several means able to carry out a coding/decoding step as respectively illustrated in the FIGS. 2 and 3.
FIG. 2 shows a diagram of a classical HEVC video encoder 20 that can be considered as a superset of one of its predecessors (H.264/AVC).
Each frame of the original video sequence 101 is first divided into a grid of coding units (CU) by the module 201. This module controls also the definition of slices.
The subdivision of the LCU into CUs and the partitioning of the CU into TUs and PUs are determined according to a rate distortion criterion. Each PU of the CU being processed is predicted spatially by an “Intra” predictor 217, or temporally by an “Inter” predictor 218. Each predictor is a block of pixels issued from the same image or another image, from which a difference block (or “residual”) is derived. Thanks to the identification of the predictor block and the coding of the residual, it is possible to reduce the quantity of information actually to be encoded.
The encoded frames are of two types: temporal predicted frames (either predicted from one reference frame called P-frames or predicted from two reference frames called B-frames) and non-temporal predicted frames (called Intra frames or I-frames). In I-frames, only Intra prediction is considered for coding CUs/PUs. In P-frames and B-frames, Intra and Inter prediction are considered for coding CUs/PUs.
In the “Intra” prediction processing module 217, the current block is predicted by means of an “Intra” predictor, a block of pixels constructed from the information already encoded of the current image. The module 202 determines a prediction mode that is used to predict pixels from the neighbors PUs pixels. In HEVC, up to 35 directions are considered. A residual block is obtained by computing the difference of the intra predicted block and current block of pixel. An intra-predicted block is therefore composed of a prediction mode with a residual. The intra prediction mode is coded in a module 203.
With regard to the second processing module 218 that is used for “Inter” coding, two prediction types are possible. Mono-prediction (P-type) consists of predicting the block by referring to one reference block from one reference picture. Bi-prediction (B-type) consists in predicting the block by referring to two reference blocks from one or two reference pictures. An estimation of motion between the current PU and reference images 215 is made by a module 204. One of its goals is to identify, in one or several of these reference images 215, one (P-type) or several (B-type) blocks of pixels to use as predictors of this current block.
The reference block is identified in the reference frame by a motion vector relating the PU in the current frame to its reference block (or prediction block). A following stage of inter prediction process is implemented by a module 205. It consists in computing the difference between the prediction block and current block. This block of difference is the residual of the inter-predicted block. At the end of the inter prediction process the current PU is composed of one motion vector and a residual.
Finally, current PU's motion vector is coded by a module 206. These two types of coding (inter or intra) thus supply several texture residuals (the difference between the current block and the predictor block), which are compared in a module 216 for selecting the best coding mode.
The residual obtained at the end of the inter or intra prediction process is then transformed by a transform module 207. The transform applies to a Transform Unit (TU) that is included in a CU. A TU can be further split into smaller TUs using a so-called Residual QuadTree (RQT) decomposition realized by the module 206. In HEVC, generally 2 or 3 levels of decompositions are used and authorized transform sizes are from 32×32, 16×16, 8×8 and 4×4. The transform basis is derived from a discrete cosine transform DCT.
The residual transformed coefficients are then quantized by a quantization module 208. The coefficients of the quantized transformed residual are then coded by an entropy coding module 209 and then inserted in a compressed bit stream 210. Coding syntax elements are also coded with help of the module 209. This processing module uses spatial dependencies between syntax elements to increase the coding efficiency.
In order to calculate the “Intra” predictors or to make an estimation of the motion for the “Inter” predictors, the encoder performs a decoding of the blocks already encoded by modules of a so-called “decoding” loop (211, 212, 213, 214, 215). This decoding loop makes it possible to reconstruct the blocks and images from the quantized transformed residuals.
Thus the quantized transformed residual is dequantized by a dequantization module 211 by applying the inverse quantization to that provided by the module 208. An inverse transform module 212 is able to reconstruct the block by applying the inverse transform regarding the transform realized by the module 207.
If the residual comes from the “Intra” coding module 217, the used “Intra” predictor is added to this residual in order to recover a reconstructed block corresponding to the original block modified by the losses resulting from a transformation with loss, here quantization operations.
If the residual on the other hand comes from the “Inter” coding module 218, the blocks pointed to by the current motion vectors (these blocks belong to the reference images 215 referred by the current image indices) are merged then added to this decoded residual. In this way the original block is modified by the losses resulting from the quantization operations.
A final loop filter 219 is applied to the reconstructed signal in order to reduce the effects created by heavy quantization of the residuals obtained and to improve the signal quality. In the current HEVC standard, 3 types of loop filters are used: a deblocking filter 213, a sample adaptive offset (SAO) 220 and an adaptive loop filter (ALF) 214.
The filtered images, also called reconstructed images, are then stored as reference images 215 in order to allow the subsequent “Inter” predictions taking place during the compression of the following images of the current video sequence.
A corresponding decoder 30 is represented in the FIG. 3. More precisely, the FIG. 3 shows a block diagram of a video decoder 30 of HEVC type. The decoder 30 receives as an input a bitstream 210 corresponding to a video sequence 101 compressed by an encoder of the HEVC type, like the one in FIG. 2.
During the decoding process, the bitstream 210 is first of all parsed with help of an entropy decoding module 301. This processing module 301 uses the previously entropy decoded elements to decode the encoded data. It decodes in particular the parameter sets of the video sequence to initialize the decoder 30 and also decodes the LCU of each video frame. Each NAL unit that corresponds to slices is then decoded.
The partition of the LCU is parsed and CU, PU and TU subdivisions are identified. The decoder 30 successively processes each CU using intra 307 and inter 306 processing modules, inverse quantization and inverse transform modules and finally loop filter 219 (which have the same structure as the loop filter in the encoder 20).
The “Inter” or “Intra” prediction mode for the current block is parsed from the bitstream 210 with help of the parsing process module 301. Depending on the prediction mode, either intra prediction processing module 307 or inter prediction processing module 306 is employed. If the prediction mode of the current block is “Intra” type, the prediction mode is extracted from the bit stream and decoded with help of neighbors' prediction mode during stage 304 of intra prediction processing module 307. The intra predicted block is then computed by the module 303 with the decoded prediction mode and the already decoded pixels at the boundaries of current PU. The residual associated with the current block is recovered from the bit stream 301 and then entropy decoded.
If the prediction mode of the current block indicates that this block is of “Inter” type, the motion information is extracted from the bitstream 210 and decoded by the module 304. This motion information is used in the reverse motion compensation module 305 in order to determine the “Inter” predictor block contained in the reference images 215 of the decoder 30. In a similar manner to the encoder, these reference images 215 are composed of images that precede the image currently being decoded and that are reconstructed from the bitstream (and therefore decoded previously).
In order to decode the residual block that has been transmitted in the bitstream, the parsing module 301 is also able to extract the residual coefficients from the bitstream 210. The modules 211 and 212 are respectively able to perform the inverse quantization and an inverse transform to obtain a residual block. This residual block is added to the predicted block, obtained at output of the intra or inter processing modules 306 and 307.
At the end of the decoding of all the blocks of the current image, the loop filter 219 is used to eliminate the block effects and improve the signal quality in order to obtain the reference images 215. As done at the encoder, this processing module employs the deblocking filter 213, then SAO 220 filter and finally the ALF 214 (not shown within the loop filter 219 in FIG. 3).
The images thus decoded by the decoder 30 constitute the output video signal 308 of the decoder, which can then be displayed and used.
More specifically, one embodiment of the invention relates to a specific coding mode, named “Transform Skip”, specified in the HEVC standard. In HEVC, there is an option allowing skipping the transform step (realized by the module 207 in the FIG. 2) and inverse transforming step (realized by the module 212 in the FIG. 3). The Transform Skip mode has been proposed to the HEVC standardization group in documents JCTVC-H0361 and JCTVC-I0408. It has been adopted in HM7 (JCTVC-I1003) in May 2012.
This mode involves skipping the transform process, which is replaced by a scaling process to keep a similar signal range as when transform applies. The skip can be allowed for certain color components blocks, for example luma or chroma component intra blocks.
More precisely, in its current design (named “HM7”), only luma or chroma blocks resulting from intra prediction and having a 4*4 size are allowed to support Transform Skip mode. However, the Transform Skip mode is not limited to this design.
To enable this mode for such blocks, first a high-level flag, placed in the SPS, is used to enable or disable the Transform Skip mode for the images of the sequence. In addition, another flag is inserted in the syntax of the prediction residual signal decoding of 4×4 blocks, to signal if the Transform Skip mode applies or not to the block. To enable this mode for a given 4×4 block, the two flags must be set to true.
FIG. 4a depicts more in detail a part of an HEVC decoder to explain operations 401 when the decoder is in Normal mode (also called first mode), that is, when Transform Skip mode (also called second mode) does not apply to a given block. The decoded coefficients coming from the entropy decoding module 301 are processed by the inverse quantization module 211 and then by the inverse transform module 212. The resulting signal, which corresponds to the prediction residual samples, is added to the intra prediction signal coming from the intra prediction module 303. The resulting signal 308 corresponds to the reconstructed samples that are then processed by the loop filter 219.
For the comparison, the FIG. 4b depicts a part of an HEVC decoder to explain operations 402 when Transform Skip is enabled, for the considered 4*4 blocks. The coefficients are decoded by the entropy decoder 301. A flag called “ts_flag” signalling if the block is using the Normal mode or the Transform Skip mode is decoded by the decoding module 403. Depending on the value of this flag, checked by the module 404, the Normal mode or the Transform Skip mode applies. In the normal mode, the decoded coefficients are processed by the inverse quantization module 211 and by the inverse transform module 212. In the Transform Skip mode, the decoded coefficients are processed by the inverse quantization module 211 and by the inverse Scaling module 405. Then the resulting signal, which corresponds to the prediction residual samples, is added to the intra prediction signal coming from the intra prediction module 303. The resulting signal 308 corresponds to the reconstructed samples that are then processed by the loop filter 219.
FIG. 5 depicts operations 501 in an HEVC encoder for the normal mode only, that is, when Transform Skip mode is disabled. In the normal mode, the intra prediction residual, resulting from the difference between the signal from the original images 101 and the signal delivered by the intra prediction module 217 and the intra/inter selection module 216, is transformed by the transform module 207, quantized by the quantization module 208, and the resulting quantized coefficients are sent to the entropy coding module 209 which is able to deliver the output bitstream 210. They are also inverse quantized by the inverse quantization module 211 and inverse transformed by the inverse transform module 212 to reconstruct the decoded residual signal, which is added to the intra prediction signal to generate the reconstructed signal. This reconstructed signal is then processed by the loop filtering 219. For inter prediction, reconstructed pictures are then stored 215, and use for the motion prediction (218).
FIG. 6 depicts operations 601 in an HEVC encoder when both the Normal mode and Transform Skip modes are checked for a 4×4 luma or chroma block of original images 101. In addition to the processing of the Normal mode (transform, quantization, inverse quantization, inverse transform described above), the processing of the Transform Skip mode is made by using the following modules. The intra prediction residual, resulting from the difference between the signal from the original images 101 and the signal delivered by the intra prediction module 217 and the intra/inter selection module 216, is scaled by a scaling module 602, quantized by the quantization module 208, and the resulting quantized coefficients are sent to the entropy coding module 209. They are also inverse quantized by the inverse quantization module 211 and inverse scaled by the inverse scaling module 603 to reconstruct the residual signal, which is added to the intra prediction signal to generate the reconstructed signal. A decision taken by a decision module 604 is applied to choose between the Normal mode and the Transform Skip mode, based typically on a rate-distortion criterion comparing the rate-distortion cost of both modes and choosing the one with the lowest rate-distortion cost. A flag named ts_flag, indicating if the Normal mode or the Transform Skip mode applies, is encoded by the encoding module 605 into the output bitstream 210. This reconstructed signal is then processed by the loop filtering 219. For inter prediction, reconstructed pictures are then stored 215, and used for the motion prediction 218.