Due to ever increasing video resolutions, and rising expectations for high quality video images, a high demand exists for efficient image data compression of video while performance is limited for coding with existing video coding standards such as H.264 or H.265/HEVC (High Efficiency Video Coding) standard. The aforementioned standards use expanded forms of traditional approaches to address the insufficient compression/quality problem, but often the results are still insufficient.
The conventional video coding processes use inter-prediction at an encoder to reduce temporal (frame-to-frame) redundancy. This is accomplished by first performing motion estimation to determine where the same or similar image data has moved between a reference frame and a current frame being analyzed. The frames are often divided into blocks, and the motion is represented by a motion vector that indicates where a block has moved from frame-to-frame. Motion compensation is then performed to apply the motion vector to construct a prediction block for a current frame to be reconstructed. The difference in image data of a block between the prediction and real (original or actual) data is called the residual data and is compressed and encoded together with the motion vectors.
The motion estimation may be performed in a number of ways. One way is to perform a search on a reference frame for one or more blocks that match a block being analyzed on the current frame. The searches, however, can be very computationally large. Thus, in order to reduce the number of searches that must be performed, a spatial technique may be applied as well. This includes computing a motion vector for a current block being analyzed by using the motion vectors of other neighbor blocks in the same frame as the current frame being analyzed. This is often some mathematical combination of the motion vectors on adjacent blocks such as a mean or median motion vector of the block above and to the left of the current block. Neighbor blocks near a current block being analyzed may be used because neighbor blocks are likely to correspond to the same moving object with similar motion and the motion of the object is not likely to change relatively abruptly from one frame to the next.
Specialty fixed function hardware and graphics processing unit (GPU) resources are often used to speed up video encoding. Such hardware may have parallel circuits that perform many simultaneous computations which can be very efficient when the same computation must be performed for thousands of blocks of pixel data in order to video encode a frame as with motion estimation. This efficiency, however, must be balanced against the use of spatial dependency between different blocks which is needed to optimize quality for an encoder. That spatial dependency is used to derive predicted motion vector(s) which becomes a skip or merge candidate for a prediction mode, and the starting point for delta motion vectors.
To perform the spatially dependent motion estimation using neighbor block data on the same frame as the current block, the analysis of the current block must wait for the motion vectors to be determined on the neighbor blocks. In other words, the motion estimation techniques that heavily rely on spatial dependencies restrict the amount of parallelism, or the amount of blocks that can be analyzed at the same time by the fixed function hardware. When the spatial dependent motion estimation is performed by traditional wavefront techniques where the system waits to analyze a wave or line of front blocks until after the analysis of a previous line of blocks is complete and their motion vectors established, this can significantly underutilize and slow a hardware system with large parallel computational capacity. While certain systems that only process one block at a time for pure fixed function encoders, this may not be a problem. For hybrid encoders which use some software and some hardware, however, the spatial dependency may or may not be an issue depending on the amount (and in turn the capacity) of hardware. As the amount of hardware increases, the spacial dependency can become a greater limiting factor that limits the amount of computations (blocks, or other units) that can run in parallel on the hardware.