Due to ever increasing video resolutions, and rising expectations for high quality video images, a high demand exists for efficient image data compression of video while performance is limited for coding with existing video coding standards such as H.264 or H.265/HEVC (High Efficiency Video Coding) standard, VP# standards such as VP9, and so forth. The aforementioned standards use expanded forms of traditional approaches to address the insufficient compression and quality problem, but often the results are still insufficient.
The conventional video coding processes use inter-prediction at an encoder to reduce temporal (frame-to-frame) redundancy. This is accomplished by first performing motion estimation to determine where the same or similar image data has moved between a reference frame and a current frame being analyzed. The frames are often divided into blocks, and the motion is represented by a motion vector that indicates where a block has moved from frame-to-frame. Motion compensation is then performed to apply the motion vector to construct a prediction block for a current frame to be reconstructed. The difference in image data of a block between the prediction and real (original or actual) data is called the residual data and is compressed and encoded together with the motion vectors.
The motion estimation is conventionally performed as a search on a reference frame for one or more blocks that match a block being analyzed on the current frame. These brute searches, however, can be very computationally large causing unnecessary delay in the coding process. Thus, in order to reduce the number of searches that must be performed, a spatial technique may be applied that computes a motion vector for a current block by using some combination of the previously computed motion vectors of neighbor blocks in the same frame as the current frame being analyzed. This is still computationally heavy and causes delays in known coding systems.
Adding to these difficulties, when compressing video for transmission in a three-dimensional video system recording a scene with multiple cameras, such as at an athletic event, the system treats each camera video sequence separately and computes motion vectors for each camera recording the scene. Such an encoding process multiplies the computations needed to perform the coding of the motion vectors causing further delay.