A common format of a digital video signal is a sequence of the images (or pictures). An image is usually a rectangular area formed by a plurality of pixels, and a digital video signal is a video image sequence, which is also called a video sequence or a sequence sometimes for short, formed by dozens of and even tens of thousands of frames of images. Coding the digital video signal is to code each image (or picture).
In the latest international High Efficiency Video Coding (HEVC) standard, when an image is coded, the image is divided into a plurality of sub-images called “Coding Elements (CUs)” with M×M pixels, and the sub-images are coded one by one by taking a CU as a basic coding element. M is usually 8, 16, 32 and 64. Therefore, coding a video image sequence is to sequentially code each CU. Similarly, during decoding, each CU is also sequentially decoded to finally reconstruct the whole video sequence.
In order to adapt to differences of image contents and characteristics of each part in an image, and pertinently and most effectively to perform coding, a size of each CU in the image may be different, some being 8×8, some being 64×64 and the like. In order to seamlessly splice the CUs with different sizes, the image is usually divided into “Largest Coding Units (LCUs)” with completely the same size and N×N pixels at first, and then each LCU is further divided into multiple CUs of which sizes may not be the same in a tree structure. Therefore, the LCU is also called Coding Tree Unit (CTU). For example, the image is divided into LCUs with completely the same size and 64×64 pixels (N=64) at first, wherein a certain LCU includes three CUs with 32×32 pixels and four CUs with 16×16 pixels, and in such a manner, the 7 CUs in the tree structure form a CTU; and the other LCU includes two CUs with 32×32 pixels, three CUs with 16×16 pixels and twenty CUs with 8×8 pixels, and in such a manner, the 25 CUs in the tree structure form the other CTU. Coding an image is to sequentially code CUs one by one.
A colour pixel includes three components. Two most common pixel colour formats include a Green, Blue and Red (GBR) colour format (including a green component, a blue component and a red component) and a YUV colour format, also called a YCbCr colour format (including a luma component and two chroma components). Therefore, when a CU is coded, the CU may be divided into three component planes (a G plane, a B plane and an R plane or a Y plane, a U plane and a V plane), and the three component planes are coded respectively; and three components of each pixel may also be bundled and combined into a triple, and the whole CU formed by these triples is coded. The former pixel and component arrangement manner is called a planar format of an image (and CUs of the image), and the latter pixel and component arrangement manner is called a packed format of the image (and CUs of the image).
The YUV colour format may also be subdivided into a plurality of sub-formats according to whether to perform down-sampling on a chroma component or not: a YUV4:4:4 pixel colour format under which a pixel includes a Y component, a U component and a V component; a YUV4:2:2 pixel colour format under which two left and right adjacent pixels include two Y components, a U component and a V component; and a YUV4:2:0 pixel colour format under which four left, right, upper and lower adjacent pixels arranged according to 2×2 spatial positions include four Y components, a U component and a V component. A component is usually represented by a number of 8-16 bits. The YUV4:2:2 pixel colour format and the YUV4:2:0 pixel colour format are both obtained by executing chroma component down-sampling on the YUV4:4:4 pixel colour format. A pixel component is also called a pixel sample, or is simply called a sample.
When an image is coded, an image coded merely by taking pixels in the same frame as reference pixels is called an I image, and an image coded by taking pixels of another frame as reference pixels is called a non-I image.
Along with development and popularization of a new-generation cloud computing and information processing mode and platform adopting a remote desktop as a typical representation form, interconnection among multiple computers, between a computer host and other digital equipment such as a smart television, a smart phone and a tablet personal computer and among various digital equipment has been realized and increasingly becomes a mainstream trend. Therefore, there is an urgent need for real-time screen transmission from a server (cloud) to a user at present. Since a large volume of screen video data is desired to be transmitted, effective and high-quality data compression for a computer screen image is inevitable.
Fully utilizing the characteristic of a computer screen image, an ultrahigh-efficiency compression on the computer screen image is a main aim of the latest international HEVC standard.
An outstanding characteristic of a computer screen image is that there may usually be two types of image content with different properties in one image. One type is continuous-tone content, which is usually a content shot by a camera, such as a streaming media content and a digital content, and the other type is discontinuous-tone content, which is usually a content generated by a computer, such as a menu, an icon and a text.
For the continuous-tone content, a great distortion in a reconstructed image obtained after lossy coding and decoding is still perceptually invisible or tolerant for a view if existing. While for the discontinuous-tone content, even a micro distortion in a reconstructed image obtained after lossy coding and decoding may be perceptually visible and intolerant for the viewer if existing.
In a related technology for coding and decoding an image and video, the whole image has relatively uniform image quality and distortion degree. In order to ensure high reconstruction quality and low distortion of a discontinuous-tone content, it is necessary to keep a continuous-tone content at high reconstruction quality and low distortion, so that lots of bits are consumed, which may cause a high bit rate of a video bitstream obtained by coding. For reducing the bit rate of the video bitstream of the continuous-tone content, it is needed to reduce the reconstruction quality, but the reconstruction quality of the discontinuous-tone content also greatly reduced, which is intolerant for a viewer.
Therefore, it is necessary to seek for a new coding and decoding tool capable of adaptively coding the continuous-tone content and the discontinuous-tone content according to different reconstruction qualities and distortion degrees. That is, the continuous-tone content in the image is allowed to have a greater distortion, while the discontinuous-tone content in the same image is merely allowed to have a micro distortion.
In a currently common image compression technology, a coding process mainly includes the steps of predictive coding, matching coding, transform coding, quantization coding, post-processing of eliminating a negative coding effect (for example a block effect and a ripple effect) and the like. Dozens of coding modes and a plurality of coding parameters may usually be adopted for predictive coding. Dozens of coding modes and a plurality of coding parameters may also be adopted for matching coding. Multiple modes and a plurality of coding parameters may also be adopted for transform coding. Dozens of Quantization Parameters (QPs) may usually be adopted for quantization coding. The magnitude of the QP largely determines the quality of the image. A low-quality reconstructed image is generated by a large QP and a high-quality reconstructed image is generated by a small QP. On the other hand, a low-bit-rate video bitstream is generated by a large QP and a high-bit-rate video bitstream is generated by a small QP. Optimal coding is to give a target bit rate and a QP (the QP may also be given three components Y, U and V or R, G and B, totally 3 QPs) for each current CU and search and select a group of predictive coding mode and parameter (or matching coding mode and parameter), transform coding mode and parameter and other related coding mode and parameter from all possible predictive coding modes and parameters, matching coding modes and parameters, transform coding modes and parameters and other related coding modes and parameters, so as to make that a generated bit rate is lower than the given target bit rate and a reconstructed image has a minimum distortion. The group of coding mode and parameter is called an optimal coding mode and parameter group. In the last stage of the coding process, the selected optimal coding mode and parameter group, the given QP and residual data, which are subjected to entropy coding, are written into a video bitstream of the current CU. In order to lower the bit rate, the QP is differentially coded, that is, it is not the QP but a difference between the QP of the current CU and a QP of a previous CU written into the bitstream of the current CU. In the currently common image compression technology, there is a slight change between the QP of a CU to the QP of the next CU, and even do not change in many places. Therefore, there is no QP difference written into the current CU in many places.
In the currently common image compression technology, a decoding process of the CU is implemented by reading the selected coding mode and parameter group, the given QP and the residual data from the video bitstream by entropy decoding; calculating partially reconstructed images (also called reconstructed images) at different degrees according to these information; and performing post-processing of eliminating a negative coding effect (such as blocking artefact and ringing artifact) to finally obtain a completely stored image.
For the problem that there is yet no effective technical solution capable of adaptively coding the continuous-tone content and the discontinuous-tone content according to different reconstruction qualities and distortion degrees in a related technology, there is yet no effective solution.