With rapid development of the Internet and increasingly rich material and spiritual culture of people, in the Internet, there are more requirements for applications for videos, especially for high-definition videos. However, a high-definition video has a very large amount of data; if the high-definition video is to be transmitted in the Internet with finite bandwidth, an issue needing to be solved first is an issue of encoding the high-definition video.
In all mainstream video encoding solutions, redundant information between pictures in a video sequence is removed by using a picture block-based motion compensated prediction technology. A picture may be considered as a two-dimensional sampling point array, and a picture block may also be considered as a two-dimensional sampling point array, where the picture block is an area of the picture, a sampling point generally may also be considered as a pixel, and a pixel is a unit for calculation of a digital image. For example, a picture may be recorded as Pic(x,y), where x=0 . . . W−1, y=0 . . . H−1, Pic(x,y) represents a value of a sampling point at a coordinate position (x,y) in a coordinate system built in a width direction and a height direction of a two-dimensional sampling point array of the picture, and W and H respectively represent a width and a height of the two-dimensional sampling point array. For another example, a picture block may be recorded as Blk(x,y), where x=0 . . . w−1, y=0 . . . h−1, Blk(x,y) represents a value of a sampling point at a coordinate position (x,y) in a coordinate system built in a width direction and a height direction of a two-dimensional sampling point array of the picture block, and w and h respectively represent a width and a height of the two-dimensional sampling point array. For example, if a luminance signal of the picture is sampled, a value of a sampling point is a luminance value.
When a picture block Blk(x,y) of a picture in a video sequence is encoded, that is, the picture block Blk(x,y) is used as a target area, a matching block Blk′(x,y) may be searched for in a reference picture and used as a reference area, and a prediction Pred(x,y) of the target area may be generated based on the matching block, so that only a prediction error determined based on the prediction needs to be encoded and transmitted to a decoder end. A position of the matching block in the reference picture is described by a motion vector. The reference picture refers to a neighboring picture that has been reconstructed when a target picture is encoded or decoded; in this case, the reference area may be a picture block in the reference picture, and matches the target area in terms of a content feature. Information indicating a source of the matching block is referred to as motion information, and may include the motion vector and other supplementary information. The motion information needs to be transmitted to the decoder end, so that the decoder end may perform a motion compensation operation the same as that performed by an encoder end to obtain the prediction of the picture block. In combination with the prediction obtained through motion compensation and the prediction error obtained through decoding, the decoder end can obtain a reconstruction of the picture block, thereby completing a decoding operation on the picture block.
A picture block having independent motion information is referred to as a motion partition. The motion partition may be a square block or a rectangular block that is not square. The motion partition is a basic unit of motion compensated prediction. For ease of implementation, in all the mainstream video encoding solutions, a rectangular motion partition is used.
In video encoding and decoding methods, an encoder end finds, in a reference picture by using a strong time correlation between the reference picture and a target picture, a reference area matching a picture block, that is, a target area, in the target picture in terms of a content feature, and determines motion partitions of the target area according to a content feature of the reference area. For example, it is displayed in the reference area that two cups are separately placed on the left and right of a desk; when picture partition is performed based on a picture content feature, the reference area may be partitioned into 3 picture blocks in total: the desk, the cup on the left, and the cup on the right, and the 3 picture blocks are used as 3 motion partitions of the target area. After the motion partitions of the target area are determined, the encoder end performs search for each motion partition of the target area to obtain motion information of each motion partition, encodes the motion information, and transmits the motion information to a decoder end. Moreover, the encoder end performs search to determine a motion vector of the reference area, and transmits the motion vector to the decoder end, so that the decoder end and the encoder end may determine the same reference area, and further use a same reference area content feature analyzing method to determine the motion partitions of the target area; after a quantity of motion partitions is determined, the decoder end performs a motion compensation operation on the target area according to the corresponding motion information to obtain predictions whose quantity is the same as that of the motion partitions, combines the predictions to obtain a final prediction of the target area, and reconstructs the target area according to the final prediction.
However, in this method, the encoder end of a video performs picture partition based on the picture content feature without considering a motion characteristic, the motion information of the motion partitions obtained through the partition has low accuracy in describing the target area, and correspondingly, the motion information obtained by the decoder end has low accuracy; therefore, video decoding performance is affected, and decoding precision is low.