Multiple video compression methods may be used to compress video data in order to minimize a bandwidth required for transmitting the video data as much as possible. The video compression methods include intra-frame compression and inter-frame compression. Recently, an inter-frame compression method based on motion estimation is often used. Specifically, a process in which a coding end of an image uses the inter-frame compression method to compress and code the image includes: splitting, by the coding end, a to-be-coded image block into several image sub-blocks of a same size; for each image sub-block, searching a reference image for an image block that best matches a current image sub-block and using the image block as a prediction block; subtracting a pixel value of each pixel of the prediction block from a pixel value of each corresponding pixel of the current image sub-block to obtain a residual; performing entropy coding on a value obtained after the residual is transformed and quantified; and finally sending a bit stream and motion vector information that are obtained through the entropy coding to a decoding end, where the motion vector information indicates a position difference between the current image sub-block and the prediction block. After obtaining the bit stream obtained through the entropy coding, the decoding end of the image first performs entropy decoding to obtain the corresponding residual and the corresponding motion vector information; obtains the corresponding matched image block (that is, the prediction block) from the reference image according to the motion vector information; and then adds a value of each pixel point in the matched image block and a value of a corresponding pixel point in the residual to obtain a value of each pixel point in the current image sub-block. The intra-frame prediction is to utilize information inside a current image to predict an image block to obtain a prediction block. The coding end obtains a corresponding pixel of the prediction block according to a prediction mode, a prediction direction, and pixel values around the image block, and subtracts the pixel of the prediction block from a pixel of the image block to obtain a residual, where the residual is written into a code stream after undergoing transform, quantification, and entropy coding; and the decoding end parses the code stream, obtains a residual block after performing entropy decoding, de-quantification, and de-transform on the code stream, obtains the prediction block according to the prediction mode, the prediction direction, and the pixel values around the image block, and adds a pixel of the residual block and the pixel of the prediction block to obtain a reconstructed image block.
Concepts of a coding unit, a prediction unit, and a transform unit exist in a current video coding and decoding standard. The coding unit is an image block operated when a coding end performs coding or a decoding end performs decoding. The prediction unit is an image block that has an independent prediction mode in the coding unit. One prediction unit may include multiple prediction blocks, where a prediction block is an image block operated when a coding unit performs a prediction operation. The transform unit is an image block operated when a coding unit performs a transform operation, and may also be called a transform block. Considering that different signals inside a prediction block are strongly correlated, large-block transform brings higher energy concentration performance than small-block transform. In a broader sense, one image block may include one or more prediction blocks, and prediction is performed by using a prediction block as a unit at the coding and decoding ends; and meanwhile, one image block includes one or more transform blocks, and transform is performed by using a transform block as a unit at the coding and decoding ends.
In an existing video coding and decoding process, one image block, which is also called a macroblock, a super-macroblock, or the like, is split into several image sub-blocks. Sizes of these image sub-blocks may be 64×64, 64×32, 32×64, 32×32, 32×16, 16×32, 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, 4×4, and the like. Pixel prediction, motion estimation, and motion compensation are performed by using these image sub-blocks as a unit for the image block. Accordingly, the coding end of the image sends information of a splitting manner of the image block in a code stream to the decoding end of the image, so that the decoding end of the image learns a splitting manner at the coding end of the image, and performs a corresponding decoding operation according to the splitting manner. In an existing video coding and decoding standard, each of these image sub-blocks is an N×M rectangular block (both N and M are an integer greater than 0), and N and M are in a multiple relationship.
In an existing video coding and decoding technology, a transform matrix may be used to remove redundant information of the image block, so as to improve coding efficiency. Generally, two-dimensional transform is used for transform of a data block in an image block. That is, the coding end multiplies residual information of the data block, one N×M transform matrix, and a transpose matrix of the N×M transform matrix to obtain a transform coefficient. The preceding step may be described by using the following formula:f=T′×C×T where C represents residual information of a data block, T and T′ represent a transform matrix and a transpose matrix of the transform matrix, and f represents a transform coefficient matrix obtained after the residual information of the data block is transformed. The transform matrix may be a discrete cosine transform (DCT) matrix, an integer transform matrix, a Karhunen Lóeve Transform (KLT) matrix, or the like. KLT may better consider texture information of an image block or an image block residual, and therefore, using KLT may achieve a better effect.
Performing the preceding processing on the residual information of the image block is equivalent to transforming the residual information of the image block from a space domain to a frequency domain, and the transform coefficient matrix f is obtained after the processing is concentrated in a low-frequency area. After performing the preceding transform on the residual information of the image block, the coding end performs processing such as quantification and entropy coding on the transform coefficient matrix obtained after the transform, and sends a bit stream obtained through the entropy coding to the decoding end. To enable the decoding end to learn a type and a size of a transform matrix used at the coding end, generally the coding end sends indication information that represents a transform matrix used by a current image block to the decoding end.
Subsequently, the decoding end determines, according to the indication information, the transform matrix used at the coding end; decodes, according to a characteristic (such as orthogonality of the transform matrix) of the transform matrix, the bit stream sent by the coding end to obtain the transform coefficient matrix; multiplies the transform coefficient matrix and the transform matrix and the transpose matrix of the transform matrix, to restore and obtain residual information of a data block that is approximately consistent with that of the coding end. The preceding step may be described by using the following formula:C=T×f×T′where C represents residual information of a data block, T and T′ represent a transform matrix and a transpose matrix of the transform matrix, and f represents a transform coefficient matrix obtained by the decoding end.
Because different regularities of distribution may exist for a residual of an image block, a good transform effect often cannot be achieved by using a transform matrix of a specific size. Therefore, in the prior art, it is attempted to use transform matrices (also called transform blocks) of different sizes for the residual of the image block. For this reason, for a 2N×2N image block, a transform matrix whose size is 2N×2N may be used, or transform matrices whose sizes are N×N or transform matrices whose sizes are 0.5N×0.5N may be used.
However, currently only a transform matrix of a square size is used. For striped texture that frequently occurs, a transform matrix of a square (square) size cannot effectively remove redundant information of an image block. Therefore, performing coding by using a non-square (non-square or rectangular) transform matrix occurs. However, the non-square transform matrix increases coding complexity. In order to code position information of a transform block, repetitive conversion is required for splitting an image block and a code block, thereby increasing complexity of a coding process.