A video sequence is a series of static images (or pictures) known as a video frame. In video encoding, each video frame includes squares of size 16×16 pixels. Each square of 16×16 pixels is known as a macroblock. The video frame is encoded as one or more slices; each slice includes a definite number of macroblocks. A slice can be an I-slice, a P-slice or a B-slice. An I-slice includes only intra-coded macroblocks, which are predicted from previously encoded macroblocks in the same slice. A P-slice may include intra-coded macroblocks and inter-coded macroblocks, which are predicted from macroblocks in previously encoded video frames. B-slices are bi-directional predicted slices, which include macroblocks that are predicted from the macroblocks of the previously encoded I/P-slices, or the future encoded I/P-slices, or the average of previously and future encoded I/P-slices.
Each macroblock in the slice is divided into partitions of varying sizes. For example, a macroblock may be divided into a partition of 16×16 pixels, two partitions of 16×8 pixels, and the like. Each partition may be further divided into blocks of varying sizes. For example, an 8×8 partition may be divided into two blocks of 8×4 pixels, four blocks of 4×4 pixels, and the like. The possible partitions of the macroblock are referred to as modes.
Many methods are available for predicting the mode within a reference video frame. In one method, a rate distortion optimization (RDO) mode is used. The RDO mode enumerates all the modes and reference pictures in terms of rate distortion costs. For each mode, multiple reference frame motion estimation is first conducted, and thereafter, the resultant rate distortion cost is utilized to make the mode decision. This method requires a significant amount of computational power.
The prediction of a mode within a reference video frame is performed by utilizing several fast-motion estimation methods. These fast-motion estimation methods use a limited number of search points for one type of partition in the motion estimation process. However, these methods result in poor matching and inaccurate selection of a reference picture due to the reduced number of search points.
In another method, the statistical characteristics of motion vectors are utilized to select the reference video frame from multiple reference video frames. The motion vectors crossing the multiple reference video frames are correlated, and the correlation of the motion vectors is utilized to select the final motion vector. This process is repeated for every possible partition of the macroblock. Thereby, the mode decision is made after all the partitions are evaluated. The method therefore requires a significant amount of computational power and time.
Further, various methods are available for an early-mode decision. These methods focus on achieving the best mode by terminating some of the inefficient modes early, thereby saving computational power. However, the methods cannot address the problem of the selection of the reference picture during motion estimation for each macroblock. Further, the methods are suitable for applications that execute the encoding programs sequentially. However, in most applications, all the modes are evaluated concurrently, and therefore, the early termination of the process of motion estimation is not beneficial.
In light of the facts given above, there is a need for a method and system, which performs the selection of reference pictures and mode decision efficiently and accurately. Moreover, the amount of computational power and computational time consumed should also be minimized.