In related art, along with development of a television and a display into resolutions of super-high definition (4K) and extra super-high definition (8K) and development and popularization of a new-generation cloud computing and information processing mode and platform adopting a remote desktop as a typical representation form, a video image data compression requirement is also made for a higher-resolution composite image including an image shot by a camera and a computer screen image. An ultrahigh-compression rate and extremely high-quality data compression technology for video images becomes indispensable.
Performing ultrahigh-efficiency compression on the video images by fully utilizing characteristics of 4K/8K images and computer screen images is also a main objective of a latest international video compression standard High Efficiency Video Coding (HEVC) under formulation and a plurality of other international standards, national standards and industrial standards.
A natural form of a digital video signal of each image is a sequence of the image. A frame of image is usually a rectangular region formed by a plurality of pixels. And a digital video signal is a video image sequence, which is also called as a video sequence or a sequence sometimes for short, formed by dozens of and even hundreds of thousands of frames of images. Coding the digital video signal is to code each frame of image. At any time, a frame of image which is being coded is called as a current coding image. Similarly, decoding a video bitstream (called as a bitstream for short, also called as a bit stream) obtained by compressing the digital video signal is to decode a bitstream of each frame of image. At any time, a frame of image which is being decoded is called as a current decoding image. The current coding image and the current decoding image are collectively called as a current image.
In almost all international standards for video image coding such as Moving Picture Experts Group (MPEG-1/2/4) H.264/Advanced Video Coding (AVC) and HEVC, when a frame of image is coded (and correspondingly decoded), the frame of image is divided into a plurality of sub-images with M×M pixels, called as coding blocks (which are decoding blocks from the point of decoding, collectively called as coding and decoding blocks) or “Coding Units (CUs)”. And the blocks of the image are coded one by one by taking a CU as a basic coding unit. M is usually 4, 8, 16, 32 and 64. Therefore, coding a video image sequence is to sequentially code each CU of each frame of image. At any time, a CU which is being coded is called as a current coding CU. Similarly, decoding a bitstream of a video image sequence is to sequentially decode each CU of each frame of image to finally reconstruct the whole video image sequence. At any time, a CU which is being decoded is called as a current decoding CU. The current coding CU and the current decoding CU are collectively called as a current CU.
In order to achieve adaptability to differences of image contents and properties of each part in a frame of image and pertinently and most effectively perform coding, sizes of each CU in the image are different, some being 8×8, some being 64×64 and the like. In order to seamlessly splice CUs with different sizes, a frame of image is usually divided into “Largest Coding Units (LCUs)” with completely the same size and N×N pixels at first. And then each LCU is further divided into multiple tree-structured CUs of which sizes may not be the same. Therefore, the LCUs are also called as “Coding Tree Units (CTUs)”. For example, a frame of image is divided into LCUs with completely the same size and 64×64 pixels (N=64) at first, and a certain LCU is formed by three CUs with 32×32 pixels and four CUs with 16×16 pixels. And in such a manner, the seven tree-structured CUs form a CTU. The other LCU is formed by two CUs with 32×32 pixels, three CUs with 16×16 pixels and twenty CUs with 8×8 pixels. In such a manner, the 25 tree-structured CUs form the other CTU. Coding a frame of image is to sequentially code each CU in each CTU. In an international standard HEVC, LCU and CTU are synonyms.
A CU is further divided into a plurality of sub-regions. The sub-regions include, but not limited to, a Prediction Unit (PU), a Transform Unit (TU) and an Asymmetric Multi-Processing (AMP) region.
A colour pixel usually consists of three components. Two most common pixel colour formats include a Green, Blue and Red (GBR) colour format consisting of a green component, a blue component and a red component and a YUV colour format, consisting of a luma component and two chroma components, and colour formats collectively called as YUV actually include multiple colour formats, such as a YCbCr colour format. Therefore, when a CU is coded, the CU is divided into three component planes (a G plane, a B plane and an R plane or a Y plane, a U plane and a V plane). And the three component planes are coded respectively. And three components of each pixel are also bundled and combined into a triple, and the whole CU formed by these triples is coded. The former pixel and component arrangement manner is called as a planar format of an image (and its CUs), and the latter pixel and component arrangement manner is called as a packed format of the image (and its CUs). A GBR colour format and YUV colour format of a pixel are both three-component representation formats of the pixel.
Besides a three-component representation format of a pixel, another common representation format of the pixel in a conventional art is a palette index representation format. In the palette index representation format, a numerical value of a pixel is represented by an index of a palette. There are stored numerical values or approximate numerical values of three components of the pixel to be represented in a palette space, and an address of the palette is called as an index of the pixel stored in the address. An index represents a component of a pixel, and an index also represents three components of a pixel. There are one or more palettes. Under the condition that there are multiple palettes, a complete index is actually formed by two parts, i.e. a palette number and an index of the palette with the palette number. An index representation format of a pixel is to represent the pixel with an index. The index representation format of the pixel is also called as an indexed color or pseudo color representation format of the pixel in the conventional art, or is usually directly called as an indexed pixel or a pseudo pixel or a pixel index or an index. An index is also called as an exponent sometimes. Representing a pixel in an index representation format is also called as indexing or exponentiating.
Other common pixel representation formats in the conventional art include a CMYK representation format and a grayscale representation format.
A YUV colour format is also subdivided into a plurality of sub-formats according to whether to perform down-sampling on a chroma component or not: a YUV4:4:4 pixel colour format under which a pixel is formed by a Y component, a U component and a V component; a YUV4:2:2 pixel colour format under which two left and right adjacent pixels are formed by two Y components, a U component and a V component; and a YUV4:2:0 pixel colour format under which four left, right, upper and lower adjacent pixels arranged according to 2×2 spatial positions are formed by four Y components, a U component and a V component. A component is usually represented by a number with 8-16 bits. The YUV4:2:2 pixel colour format and the YUV4:2:0 pixel colour format are both obtained by executing chroma component down-sampling on the YUV4:4:4 pixel colour format. A pixel component is also called as a pixel sample, or is simply called as a sample.
A most basic element during coding or decoding is a pixel, is also a pixel component, and is further a pixel index (i.e. indexed pixel). A pixel or pixel component or indexed pixel adopted as the most basic element for coding or decoding is collectively called as a pixel sample, and is also collectively called as a pixel value sometimes or simply called as a sample.
In an application document of the present disclosure, “pixel sample”, “pixel value”, “sample”, “indexed pixel” and “pixel index” are synonyms, and according to the context, it is clear that whether a “pixel” is represented or “a pixel component” is represented or an “indexed pixel” is represented or any one of the three is simultaneously represented. If it cannot get clear from the context, any one of the three is simultaneously represented.
In the application document of the present disclosure, a coding block or a decoding block (collectively called as a coding and decoding block) is a region formed by a plurality of pixels. A shape of the coding and decoding block is a rectangle, a square, a parallelogram, a trapezoid, a polygon, a round, an ellipse, a string and any other shape. The rectangle also includes a rectangle of which a width or height is a pixel value and which is degenerated into a line (i.e. a line segment or a line shape). In a frame of image, each coding and decoding block has a different shape and size. In the frame of image, some or all of coding and decoding blocks have mutually overlapped parts, and all of the coding and decoding blocks are also not overlapped. A coding and decoding block is formed by one of “pixels”, “components of the pixels” and “indexed pixels”, or is also formed by mixing the three or mixing any two of the three. From a point of video image coding or decoding, a coding and decoding block refers to a region which is coded or decoded in a frame of image, including, but not limited to, at least one of: an LCU, a CTU, a CU, a sub-region of the CU, a PU, a TU, a string of pixels and a group of pixels.
A video image compression technology in the related art includes: a prediction manner (including, but not limited to, intraframe prediction and interframe prediction) and a copying manner (including, but not limited to, block copying, index copying, micro-block copying, strip copying, string copying, rectangular copying and point copying). Here, “copying” refers to copying an optimal matched pixel found by a coder. Therefore, from a point of a coder, the copying manner is also called as a matching manner (including, but not limited to, block matching, index matching, micro-block matching, strip matching, string matching, rectangular matching and point matching).
An important characteristic of the prediction manner and the copying manner is that reconstructed pixel samples (including at least one of completely reconstructed pixel samples and pixel samples which are partially reconstructed to different extents), called as predicted values (also called as reference values) of current coding or decoding pixel samples (called as current pixel samples for short), in an image region outside a current coding and decoding block (called as a current block for short) are copied and the predicted values are assigned to the current pixel samples as the reconstructed pixel samples of the current pixel samples.
In the prediction manner and copying manner in the related art, predicted values are obtained from reconstructed pixel samples in an image region outside a current block, and have no direct relation with pixel samples of the current block, so that there exists a problem that a mutual correlation is weaker and compression efficiency is lower if distances between positions of these reconstructed pixels and the current block are longer.
For the problem that the mutual correlation is weaker and compression efficiency is lower if distances between positions of these reconstructed pixels for obtaining predicted values and the current block are longer in the related art, no effective solution has been provided yet.