Block-based motion estimation is an important element in many video coding standards that aims at removing temporal redundancy between neighboring frames. Traditional methods for block-based motion estimation such as the Exhaustive Block Matching Algorithm (EBMA) are capable of achieving good matching performance but are computationally expensive. Alternatives to EBMA have been proposed to reduce the amount of search points by trading off matching optimality with computational resources. Although they exploit shared local spatial characteristics around the target block, they fail to take advantage of the spatio-temporal characteristics of the video data itself. Spatio-temporal characteristics of the video provide useful information that can reduce the computational load incurred by block-matching algorithms in cameras (e.g., mounted cameras for traffic monitoring in highways) where motion characteristics of objects have trending patterns across time.
Video compression is employed in applications where high quality video transmission and/or archival is required. For example, a surveillance system typically includes a set of cameras that relay video data to a central processing and archival facility. While the communication network used to transport the video stream between the cameras and the central facility may be built on top of proprietary technology, traffic management centers have recently started to migrate to Internet Protocol- or IP-compliant networks. In either case, the underlying communication network typically has bandwidth constraints which dictate the use of video compression techniques on the camera end, prior to transmission. In the case of legacy analog cameras, compression is performed at an external encoder attached to the camera, whereas digital or IP cameras typically integrate the encoder within the camera itself. Typical transmission rates over IP networks require the frame rate of multi-megapixel video streams to be limited to fewer than 5 frames per second (fps). The latest video compression standards enable the utilization of the full frame rate camera capabilities for transmitting high definition video at the same network bandwidth. For example, transmission of 1080 p HD uncompressed video requires a bandwidth of 1.5 Gbps, while its compressed counterpart requires only 250 Mbps; consequently, transmission of compressed video with at least 6 times the frame rate of the uncompressed version would be possible over the same network infrastructure.
Video compression is achieved by exploiting two types of redundancies within the video stream: spatial redundancies amongst neighboring pixels within a frame, and temporal redundancies between adjacent frames. This modus operandi gives raise to two different types of prediction, namely intra-frame and inter-frame prediction, which in turn result in two different types of encoded frames, reference and non-reference frames. Reference frames, or “I-frames” are encoded in a standalone manner (intra-frame) using compression methods similar to those used to compress digital images. Compression of non-reference frames (e.g., P-frames and B-frames) entails using inter-frame or motion-compensated prediction methods where the target frame is estimated or predicted from previously encoded frames in a process that typically entails three steps: (i) motion estimation, where motion vectors are estimated using previously encoded frames. The target frame is segmented into pixel blocks called target blocks, and an estimated or predicted frame is built by stitching together the blocks from previously encoded frames that best match the target blocks. Motion vectors describe the relative displacement between the location of the original blocks in the reference frames and their location in the predicted frame. While motion compensation of P-frames relies only on previous frames, previous and future frames are typically used to predict B-frames; (ii) residual calculation, where the error between the predicted and target frame is calculated; and (iii) compression, where the error residual and the extracted motion vectors are compressed and stored. Throughout the teachings herein, the terms “motion vector” and “compression-type motion vector” are used synonymously.
There is a need in the art for systems and methods that facilitate block-based motion estimation that are both computationally efficient and capable of exploiting the dominant spatio-temporal characteristics of the motion patterns captured in the video, without sacrificing matching performance relative to exhaustive methods, while overcoming the aforementioned deficiencies.