Multiple video compression methods may be used to compress video data in order to minimize a bandwidth required for transmitting the video data as much as possible. The video compression methods include intra-frame compression and inter-frame compression.
Currently, an inter-frame compression method based on motion estimation is often used. Specifically, a process in which a coding end of an image uses the inter-frame compression method to compress and code the image includes: dividing, by the coding end, a to-be-coded image block into several image sub-blocks of a same size; for each image sub-block, searching a reference image for an image block that best matches a current image sub-block and using the image block as a prediction block; subtracting a pixel value of the prediction block from a corresponding pixel value of the current image sub-block to obtain a residual; performing entropy coding on a value obtained after the residual is transformed and quantified; and finally sending a bit stream that is obtained through the entropy coding and motion vector information to a decoding end, where the motion vector information indicates a position difference between the current image sub-block and the prediction block.
After obtaining the bit stream obtained through the entropy coding, the decoding end of the image first performs entropy decoding to obtain the corresponding residual and the corresponding motion vector information; obtains the corresponding matched image block (that is, the prediction block) from the reference image according to the motion vector information; and then adds a value of each pixel point in the matched image block and a value of a corresponding pixel point in the residual to obtain a value of each pixel point in the current image sub-block. The intra-frame prediction is to utilize information inside a current image to predict an image block to obtain a prediction block. The coding end obtains a corresponding pixel of the prediction block according to a prediction mode, a prediction direction, and pixel values around the image block, and subtracts a pixel of the prediction block from the pixel of the image block to obtain a residual, where the residual is written into a code stream after undergoing transform, quantification, and entropy coding; and the decoding end parses the code stream, obtains a residual block after performing entropy decoding, de-quantification, and de-transform on the code stream, obtains the prediction block according to the prediction mode, the prediction direction, and the pixel values around the image block, and adds a pixel of the residual block and the pixel of the prediction block to obtain a reconstructed image block.
Concepts of a coding unit, a prediction unit, and a transform unit exist in a current video coding and decoding standard. The coding unit is an image block operated when a coding end performs coding or a decoding end performs decoding. The prediction unit is an image block that has an independent prediction mode in the coding unit. One prediction unit may include multiple prediction blocks, where a prediction block is an image block operated when a coding unit performs a prediction operation. The transform unit is an image block operated when a coding unit performs a transform operation, and may also be called a transform block. Considering that difference signals inside a prediction block are strongly correlated, large-block transform brings higher energy concentration performance than small-block transform. In a broader sense, one image block may include one or more prediction blocks, and prediction is performed by using a prediction block as a unit at the coding and decoding ends; and meanwhile, one image block includes one or more transform blocks, and transform is performed by using a transform block as a unit at the coding and decoding ends.
In an existing video coding and decoding standard, such as moving picture experts group (MPEG) or H.264/AVC (Advanced Video Coding), one image block, called a macroblock, a super-macroblock, or the like, is divided into several image sub-blocks. Sizes of these image sub-blocks are 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, 4×4, and the like. These sizes of the image sub-blocks are used for the preceding motion estimation and motion compensation. The coding end of the image needs to send a code word that identifies a dividing manner of the image block to the decoding end of the image, so that the decoding end of the image learns a dividing manner at the coding end of the image, and determines a corresponding prediction block according to the dividing manner and motion vector information. In the existing video coding and decoding standard, each of these image sub-blocks is an N×M rectangular block (both N and M are an integer greater than 0), and N and M are in a multiple relationship.
Common manners of dividing an image block into an image sub-block are as follows: a 2N×2N dividing manner, in which an image block includes only one image sub-block, that is, the image block is not divided into smaller image sub-blocks, as shown in FIG. 1a; a 2N×N dividing manner, in which an image block is divided into one upper image sub-block and one lower image sub-block that are of a same size, as shown in FIG. 1b; an N×2N dividing manner, in which an image block is divided into one left image sub-block and one right image sub-block that are of a same size, as shown in FIG. 1c; and an N×N dividing manner, in which an image block is divided into four image sub-blocks of a same size, as shown in FIG. 1d. N is any positive integer.
In addition, an asymmetrical dividing manner may also be applied to an image block, as shown in FIG. 2a to FIG. 2d. In dividing manners shown in FIG. 2a and FIG. 2b, one image block is divided into one upper rectangular image sub-block and one lower rectangular image sub-block of different sizes. In two image sub-blocks obtained through dividing in a 2N×nU dividing manner (where n=0.5 N) shown in FIG. 2a, the lengths of two sides of an upper image sub-block are 2N and 0.5N, and the lengths of two sides of a lower image sub-block are 2N and 1.5N. In 2N×nU, U indicates that an image dividing line shifts upward to a midnormal of the image block. 2N×nU indicates that the image dividing line shifts upward by n to the midnormal of the image block, where n=x*N and x is greater than or equal to 0 and smaller than or equal to 1; in two image sub-blocks obtained through dividing in a 2N×nD dividing manner (where n=0.5 N) shown in FIG. 2b, the lengths of two sides of an upper image sub-block are 2N and 1.5N, and the lengths of two sides of a lower image sub-block are 2N and 0.5N. In 2N×nD, D indicates that an image dividing line shifts downward to a midnormal of the image block. 2N×nD indicates that the image dividing line shifts downward by n to the midnormal of the image block, where n=x*N and x is greater than or equal to 0 and smaller than or equal to 1.
In dividing manners shown in FIG. 2c and FIG. 2d, one image block is divided into one left rectangular image sub-block and one right rectangular image sub-block of different sizes. In two image sub-blocks obtained through dividing in an nL×2N dividing manner (where n=0.5 N) shown in FIG. 2c, lengths of two sides of a left image sub-block are 0.5N and 2N. In nL×2N, L indicates that an image dividing line shifts leftward to a midnormal of the image block. nL×2N indicates that the image dividing line shifts leftward by n to the midnormal of the image block, where n=x*N and x is greater than or equal to 0 and smaller than or equal to 1. Lengths of two sides of a right image sub-block are 1.5N and 2N; in two image sub-blocks obtained through dividing in an nR×2N dividing manner (where n=0.5 N) shown in FIG. 2d, lengths of two sides of a left image sub-block are 1.5N and 2N, and lengths of two sides of a right image sub-block are 0.5N and 2N. In nR×2N, R indicates that an image dividing line shifts rightward to a midnormal of the image block. nR×2N indicates that the image dividing line shifts rightward by n to the midnormal of the image block, where n=x*N and x is greater than or equal to 0 and smaller than or equal to 1.
The preceding image block dividing manners may also be represented by using prediction block types. 2N×2N, 2N×N, N×2N, 2N×nU, 2N×nD, nL×2N, and nR×2N all represent prediction block types corresponding to the image block dividing manners.
Among the preceding image block dividing manners, a dividing manner of dividing an image block or a transform block by a horizontal dividing line into multiple image sub-blocks or transform blocks or prediction blocks that are arranged along a vertical direction is a horizontal dividing manner, and a dividing direction used in this case is a horizontal dividing direction. The 2N×N dividing manner, the 2N×nU dividing manner, and the 2N×nD dividing manner are collectively referred to as a horizontal dividing manner; while a dividing manner of dividing an image block or a transform block by a vertical dividing line into multiple image sub-blocks or transform blocks or prediction blocks that are arranged along a horizontal direction is a vertical dividing manner, and a dividing direction used in this case is a vertical dividing direction. The N×2N dividing manner, the nL×2N dividing manner, and the nR×2N dividing manner are collectively referred to as a vertical dividing manner; and a dividing manner of dividing an image block or a transform block by a horizontal dividing line and a vertical dividing line into four image sub-blocks or transform blocks or prediction blocks is a horizontal and vertical dividing manner, and a dividing direction used in this case is a horizontal and vertical dividing direction. The N×N dividing manner is a horizontal and vertical dividing manner.
In an existing video coding and decoding technology, a transform matrix may be used to remove correlation of a residual of an image block, that is, to remove redundant information of the image block, so as to improve coding efficiency. Generally, two-dimensional transform is used for transform of a data block in an image block. That is, the coding end multiplies residual information of the data block, one N×M transform matrix, and a transpose matrix of the N×M transform matrix to obtain a transform coefficient. The preceding step may be described by using the following formula:f=T′×C×T 
where C represents residual information of a data block, T and T′ represent a transform matrix and a transpose matrix of the transform matrix, and f represents a transform coefficient matrix obtained after the residual information of the data block is transformed. The transform matrix may be a discrete cosine transform (DCT) matrix, an integer transform matrix, a KL transform (Karhunen Lòeve Transform, KLT) matrix, or the like. KLT may better consider texture information of an image block or an image block residual, and therefore, using KLT may achieve a better effect.
Performing the preceding processing on the residual information of the image block is equivalent to transforming the residual information of the image block from a space domain to a frequency domain, and the transform coefficient matrix f obtained after the processing is concentrated in a low-frequency area. After performing the preceding transform on the residual information of the image block, the coding end performs processing such as quantification and entropy coding on the transform coefficient matrix obtained after the transform, and sends a bit stream obtained through the entropy coding to the decoding end. To enable the decoding end to learn a type and a size of a transform matrix used at the coding end, generally the coding end sends indication information that represents a transform matrix used by a current image block to the decoding end.
Subsequently, the decoding end determines, according to the indication information, the transform matrix used at the coding end; decodes, according to a characteristic (such as orthogonality of the transform matrix) of the transform matrix, the bit stream sent by the coding end to obtain the transform coefficient matrix; multiplies the transform coefficient matrix and the transform matrix and the transpose matrix of the transform matrix, to restore and obtain residual information of a data block that is approximately consistent with that of the coding end. The preceding step may be described by using the following formula:C=T×f×T′
where C represents residual information of a data block, T and T′ represent a transform matrix and a transpose matrix of the transform matrix, and f represents a transform coefficient matrix obtained by the decoding end.
Because different regularities of distribution may exist for a residual of an image block, a good transform effect often cannot be achieved by using a transform matrix of a specific size. Therefore, in the prior art, it is attempted to use transform matrices (also called transform blocks) of different sizes for the residual of the image block. For this reason, for a 2N×2N image block, a transform matrix whose size is 2N×2N may be used, or transform matrices whose sizes are N×N or transform matrices whose sizes are 0.5N×0.5N may be used. To effectively represent how transform matrices of different sizes are used for an image block, a tree identification method may be used.
As shown in FIG. 3, when a transform size used by an image block is identified, a first layer of a code steam has an indicator bit used to identify whether the image block uses a transform matrix whose size is 2N×2N. If the image block uses the transform matrix whose size is 2N×2N (as shown in FIG. 3a), the indicator bit is 0. If the image block does not use 2N×2N transform, the indicator bit is 1, indicating that the transform matrix whose size is 2N×2N needs to be further divided into four transform matrices whose sizes are N×N, and four bits are used in a second-layer structure of the code stream to respectively identify whether to further divide each transform matrix whose size is N×N. When the image block uses a transform structure shown in FIG. 3b, all the four bits are 0, indicating that each transform matrix whose size is N×N is not further divided.
When a transform structure shown in FIG. 3c is used, two of the four bits are 0 and the rest two bits are 1. The two bits being 0 indicate that a lower left transform matrix and an upper right transform matrix whose sizes are N×N are not further divided. The rest two bits being 1 indicate that an upper left transform matrix and a lower right transform matrix whose sizes are N×N need to be further divided to obtain transform matrices whose sizes are 0.5N×0.5N. Then, in a third-layer structure of the code stream, four bits are used to indicate whether upper left transform matrices whose sizes are 0.5N×0.5N need to be further divided; and four bits are used to indicate whether lower right transform matrices whose sizes are 0.5N×0.5N need to be further divided. If the image block uses the transform structure shown in FIG. 3c, all the 4+4 bits are 0, indicating that further dividing is not performed. The layer-by-layer identification in the code stream may effectively and flexibly represent transform sizes used by the image block and the image sub-blocks.
In the preceding method of using layer-by-layer identification in the prior art, a size of a transform matrix is not correlated to a size of a prediction block. As shown in FIG. 4a, when a 2N×2N image block uses asymmetrical dividing (a dividing line is indicated by a heavy solid line shown in the figure), if the current image block uses a transform matrix whose size is 2N×2N, the transform matrix crosses a boundary of a prediction block; if the current image block uses four transform matrices whose sizes are N×N, the transform matrices still cross the boundary of the prediction block; if the lower left and upper right of the current image block use transform matrices whose sizes are N×N and the upper left and lower right of the current image block use transform matrices whose sizes are 0.5N×0.5N, a lower left transform matrix whose size is N×N of the current image block still crosses the boundary of the prediction block.
The prior art has the following disadvantages:
In the prior art, a size of a transform matrix is not correlated to a size of a prediction block, and as a result, the transform matrix crosses a boundary of the prediction block. Abrupt transform may exist for residual data corresponding to boundaries of two prediction blocks. Therefore, if a transform matrix crosses the boundaries of the two prediction blocks, a transform effect is weakened; correlation of a residual of an image block cannot be effectively removed; redundant information of the image block cannot be effectively removed; and coding efficiency is lowered.