With the development and popularization of a new generation of cloud computing and information processing modes and platforms with remote desktops being typical forms of expression, interconnections between multiple computers, between computer hosts and other digital devices such as smart TVs, smart phones, tablet PCs and between various types of digital devices have become a reality and are becoming gradually a mainstream trend. This makes real-time screen transfer from a server-side (cloud) to a client-side become a current imperative requirement. Due to the large amount of screen video data that needs to be transmitted, taking a 24-bit true color screen image with a resolution of 2048×1536 pixels and a refresh rate of 60 frames per second of a tablet as an example, data is required to be transmitted at a speed of up to 2048×1536×60×24=4320 megabits per second. It is not possible to transmit so much data in real-time under current network conditions, thus effective data compression for computer screen images is essential.
Compressing the computer screen images with ultra-high efficiency by taking full advantage of features of the computer screen images is also a primary objective of the latest international video compression standard HEVC (High Efficiency Video coding) under development and a number of other international standards, domestic standards and industrial standards.
A natural form of digital video signals of screen images (or pictures) is a sequence of images (or pictures). An image is usually a rectangular region consisting of a number of pixels. If there are 50 frames of image per second in a digital video signal, a 30-minute digital video signal is a video image sequence consisting of30×60×50=90000 frames of image, sometimes referred to as a video sequence or sequence. Performing coding on the digital video signal is to perform coding on the image one by one. At any moment, the image being coded is called the current coding image. Similarly, performing decoding on the compressed video bitstream (also referred to as a bitstream) of the digital video signal is to perform decoding on the bitstream of the image one by one. At any moment, the image being decoded is called the current decoding image. The current coding image or the current decoding image is collectively referred to as the current image.
In almost all international standards for video coding, such as MPEG-1/2/4, H.2649/AVC, and HEVC, when an image is coded, the image is partitioned into sub-images of several blocks of M×M pixels, called “coding units (CUs)”. The sub-images are coded block by block using the CU as a basic coding unit. The commonly used size of M is 4, 8, 16, 32, 64. Therefore, performing coding on a video image sequence is to perform coding on each of the coding units of each image, i.e., the CUs are coded one by one. At any moment, the CU being coded is called the current coding CU. Similarly, performing decoding on bitstreams of a video image sequence is to perform decoding on each of the coding units of each image, i.e., the CUs are decoded one by one to reconstruct the entire video image sequence finally. At any moment, the CU being decoded is called the current decoding CU. The current coding CU or the current decoding CU is collectively referred to as the current CU.
In order to accommodate differences among contents and natures of various image portions of an image for efficient coding, the size of each of the CUs in the image can be different, some are 8×8, some are 64×64, and so on. In order to enable seamless splicing of different sizes of CUs, an image is usually first partitioned into “largest coding Units (LCUs)” of the same size with N×N pixels, and then each of the LCUs is further partitioned into multiple CUs with sizes that are not necessarily the same. For example, an image is first partitioned into LCUs of the same size with 64×64 pixel (N=64). One of the LCUs consists of three CUs with 32×32 pixels and four CUs with 16×16 pixels, while another LCU consists of two CUs with 32×32 pixels, three CUs with 16×16 pixels and 20 CUs with 8×8 pixels. Since each CU in a LCU has a tree structure, another name of the LCU is coding Tree Unit (CTU). In the HEVC international standard, LCU and CTU are synonyms.
The CUs can also be further partitioned into a certain number of sub-regions. The sub-regions include prediction units (PUs), transform units (TUs), or asymmetric partitions (AMPs).
A coding block or a decoding block is a region, on which coding or decoding is performed, in an image.
A color pixel is usually composed of three components. The most commonly used two pixel representation formats, i.e., pixel color formats, are a GBR color format composed of a green component, a blue component and a red component, and a YUV color format composed of a luminance (luma) component and two chroma components. The color format collectively known as YUV actually includes a variety of color formats, such as a YCbCr color format. Therefore, when a CU is coded, the CU can be divided into three component planes (G plane, B plane, R plane or Y plane, U plane, V plane), and the three component planes can be coded separately; three components of a pixel can also be bound and combined into a 3-tuple, and the whole CU composed of 3-tuples is coded. The former arrangement of pixels and their components is called a planar format of the image (and its CUs), and the latter arrangement of pixel and their components is called a packed pattern of the image (and its CUs). Both the GBR color format and YUV color format of the pixels are 3-component representation formats of the pixels.
In addition to the 3-component presentation format of the pixels, another common presentation format of the pixels is a palette index presentation format. In the palette index presentation format, a value of one pixel can also be represented by an index of a palette. Values or approximate values of three components of the pixel which need to be represented are stored in a palette space, and an address of the palette is called an index of the pixel stored in the address. An index can represent one component or three components of the pixel. There may be one or more palettes. In the case of multiple palettes, a complete index is actually composed of two parts: a palette number and an index of the palette with the number. An index presentation format of a pixel is to express this pixel using the index. The index presentation format of the pixel is also referred to as an indexed color or pseudo color representation format of the pixel, or is often directly referred to as an indexed pixel or a pseudo pixel or a pixel index or index. The Index is sometimes referred to as index number. The pixel being expressed in its index presentation format is also known as indexing or indexation.
Other commonly used pixel presentation formats of the related techniques include a CMYK presentation format and a grayscale presentation format.
The YUV color format can be subdivided into several seed formats according to whether the chroma components are down-sampled: 1 pixel has a YUV4:4:4 pixel color format composed of 1 Y component, 1 U component and 1 V component; 2 pixels adjacent in a left-to-right direction have a YUV4:2:2 pixel color format composed of 2 Y components, 1 U component and 1 V component; and 4 pixels adjacent in a left-to-right direction and an up-and-down direction and arranged in 2×2 spatial positions have a YUV4:2:0 pixel color format composed of 4 Y components, 1 U component and 1 V component. A component is usually represented by a digit with 8 to 16 bits. Both the YUV4:2:2 pixel color format and the YUV4:2:0 pixel color format are obtained by down-sampling the chroma components for the YUV4:4:4 pixel color format. A pixel component is also called a pixel sample value or simply called a sample value.
The most basic element in coding or decoding can be a pixel, or a pixel component, or a pixel index (i.e., an index pixel). A pixel or a pixel component or a pixel index (index pixel), which is the most basic element for coding or decoding, is collectively referred to as a pixel sample value, sometimes referred to as a pixel value, or simply a sample value.
A CU is a region consisting of several pixel values. The shape of the CU can be rectangular, square, parallelogram, trapezoid, polygon, circle, oval and various other shapes. An rectangle also includes a rectangle with a width or height of one pixel value, degenerated to a line (i.e., a line segment or linear shape). In an image, various CUs may have different shapes and sizes. In an image, some or all of the CUs can have overlapping portions, or all CUs do not overlap with each other. A CU may be composed of “pixels”, or “pixel components”, or “index pixels”, or a combination of three, or any two of them.
A remarkable feature of computer screen images is that there are usually many similar or even identical pixel patterns within the same image. For example, Chinese or foreign characters appearing often in the computer screen image are composed of several types of basic strokes, and a lot of similar or same strokes can be found within the same image. Common menus, icons, etc., in the computer screen images also have many similar or identical patterns.
Coding modes commonly used in coding the related image and video compression technology include:
1) Intra-frame block copy: also known as “intra-frame block matching”, or “intra-frame motion compensation”, or “block copy”, or “block matching”. The basic operation for block copy coding or decoding is to copy a reference block of a predetermined size (e.g., 64×64 or 32×32 or 16×16 or 8×8 or 4×4 or 64×32 or 16×32 or 16×4 or 8×4 or 4×8 pixel samples) from a set of the reconstructed reference pixel samples and assign a value of the reference block to the current block.
2) Intra-frame micro-block copy: also known as “intra-frame micro-block matching”, or “micro-block copy” or “micro-block matching”. In the micro-block copy, blocks (such as 8×8 pixel samples) are partitioned into finer micro-blocks (such as 4×2 pixel samples or 8×2 pixel samples or 2×4 pixel samples or 2×8 pixel samples). The basic operation for micro-block copy coding or decoding is to copy a reference micro-block from a set of the reconstructed reference pixel samples and assign a value of the reference micro-block to the current micro-block.
3) Intra-frame linear line (referred to as line) copy: also known as “intra-frame line matching”, or “line copy”, or “line matching”. The line refers to a micro-block of a height (or width) of 1, such as a micro-block with 4×1 or 8×1 or 1×4 or 1×8 pixel samples. The basic operation for line copy coding or decoding is to copy a reference line from a set of the reconstructed reference pixel samples and assign a value of the reference line to the current line. Obviously, the line copy is a special case of the micro-block copy.
4) Intra-frame string copy: also known as “intra-frame string matching”, or “string copy”, or “string matching”. The string here means that pixel samples in a two-dimensional region of any shape are arranged into a string with the length much larger than the width (for example, a string with a width of 1 pixel sample value and a length of 37 pixel samples, or a string with a width of 2 pixel samples and a length of 111 pixel samples, usually including, but not limited to, a string with a length that is an independent coding or decoding parameter and a width that is a parameter derived from other coding or decoding parameters). The basic operation for string copy coding or decoding is to copy a reference string from a set of the reconstructed reference pixel samples and assign a value of the reference string to the current string. The string copy can be divided into the following sub-types according to a path shape of the string:
4a) One-dimensional horizontal scan string copy
Both the reference string and the current string are one-dimensional pixel sample value strings with the same lengths arranged in an order of horizontally scanning in CTUs or CUs, but two-dimensional regions formed by the two strings respectively do not necessarily have the same two-dimensional shape.
4b) One-dimensional vertical scan string copy
Both the reference string and the current string are one-dimensional pixel sample value strings with the same lengths arranged in an order of vertically scanning in CTUs or CUs, but two-dimensional regions formed by the two strings respectively do not necessarily have the same two-dimensional shape.
4c) Imitating two-dimensional horizontal scanning conformal equal-width string copy (referred to as imitating two-dimensional horizontal string copy)
Both the reference string and the current string with the same lengths are arranged in the identical two-dimensional shape in an order of horizontally scanning, and the width of the formed two-dimensional region is equal to the width of the current coding block or decoding block.
4d) Imitating two-dimensional vertical scanning conformal equal-height string copy (referred to as imitating two-dimensional vertical string copy)
Both the reference string and the current string with the same lengths are arranged in the identical two-dimensional shape in an order of vertically scanning, and the height of the formed two-dimensional region is equal to the height of the current coding block or decoding block.
4e) Two-dimensional horizontal scanning conformal variable-width string copy (referred to as two-dimensional horizontal string copy)
Both the reference string and the current string with the same lengths are arranged in the identical two-dimensional shape in an order of horizontally scanning, and the width of the formed two-dimensional region is not necessarily equal to the width of the current coding block or decoding block, and is not greater than the variable width of the current coding block or decoding block.
4f) Two-dimensional vertical scanning conformal variable-width string copy (referred to as two-dimensional vertical string copy)
Both the reference string and the current string with the same lengths are arranged in the identical two-dimensional shape in an order of vertically scanning, and the height of the formed two-dimensional region is not necessarily equal to the height of the current coding block or decoding block, and is not greater than the variable height of the current coding block or decoding block.
5) Intra-frame rectangle copy (also known as intra-frame rectangle matching or rectangle copy or rectangle matching)
A rectangle here refers to a two-dimensional region of any size characterized by a width and a height. The basic operation for rectangle copy coding or decoding is to copy a reference rectangle from a set of the reconstructed reference pixel samples and assign a value of the reference rectangle to the current rectangle. The reference rectangle and the current rectangle have the same width and height, and thus have the identical two-dimensional rectangular shape. Such rectangle is also formed by a string of pixel samples, the length of which is a product of the height and width of the rectangle, that is, the length of the string is exactly a multiple of the width of the two-dimensional region formed by the string (this multiple is the height of the two-dimensional region), and is also exactly a multiple of the height of the two-dimensional region formed by the string (this multiple is the width of the two-dimensional region). Obviously, the rectangle copy is a special case of 4e) or 4f) described above, that is, a special case where the length of the string is exactly the product of the height and width of the rectangle.
Another coding (decoding) mode commonly used in coding the related image and video compression techniques is a palette coding (decoding) mode. In the palette coding (decoding) mode, a palette is first constructed (acquired), and then part or all of pixels of the current coding block (the current decoding block) are represented by indexes of the palette, and then the indexes are coded (decoded). The way for performing coding (decoding) on the indexes includes run length and/or entropy coding (decoding).
Since screen images have usually regions with various different properties, for example, some have larger or more regularly-shaped similar or same patterns, while others have very small or irregularly-shaped similar or same patterns. Any coding mode is only applicable to coded image regions of a certain type of properties.
It is difficult to find a uniform matching mode for screen images with mixed coded image regions of multiple types of properties. Therefore, a new coding tool is required to be found to fully explore and use characteristics of coded image regions in computer screen images, thereby improving the compression effect.