Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A “codec” is an encoder/decoder system.
Over the last two decades, various video codec standards have been adopted, including the ITU-T H.261, H.262 (MPEG-2 or ISO/IEC 13818-2), H.263 and H.264 (MPEG-4 AVC or ISO/IEC 14496-10) standards, the MPEG-1 (ISO/IEC 11172-2) and MPEG-4 Visual (ISO/IEC 14496-2) standards, and the SMPTE 421M (VC-1) standard. More recently, the HEVC standard (ITU-T H.265 or ISO/IEC 23008-2) has been approved. Extensions to the HEVC standard (e.g., for scalable video coding/decoding, for coding/decoding of video with higher fidelity in terms of sample bit depth or chroma sampling rate, or for multi-view coding/decoding) are currently under development. A video codec standard typically defines options for the syntax of an encoded video bitstream, detailing parameters in the bitstream when particular features are used in encoding and decoding. In many cases, a video codec standard also provides details about the decoding operations a decoder should perform to achieve conforming results in decoding. Aside from codec standards, various proprietary codec formats define other options for the syntax of an encoded video bitstream and corresponding decoding operations.
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression. Intra-picture compression techniques compress individual pictures, and inter-picture compression techniques compress pictures with reference to a preceding and/or following picture (often called a reference or anchor picture) or pictures.
Inter-picture compression techniques often use motion estimation and motion compensation to reduce bit rate by exploiting temporal redundancy in a video sequence. Motion estimation is a process for estimating motion between pictures. In one common technique, an encoder using motion estimation attempts to match a current block of sample values in a current picture with a candidate block of the same size in a search area in another picture, the reference picture. When the encoder finds an exact or other match satisfying a closeness criteria in the search area in the reference picture, the encoder parameterizes the change in position between the current and candidate blocks as motion data (such as a motion vector (“MV”)). An MV is conventionally a two-dimensional value, having a horizontal MV component that indicates left or right spatial displacement and a vertical MV component that indicates up or down spatial displacement.
An MV can indicate a spatial displacement in terms of an integer number of samples starting from a co-located position in a reference picture for a current block. For example, for a current block at position (32, 16) in a current picture, the MV (−3, 1) indicates position (29, 17) in the reference picture. Or, an MV can indicate a spatial displacement in terms of a fractional number of integer samples from a co-located position in a reference picture for a current block. For example, for a current block at position (32, 16) in a current picture, the MV (−3.5, 1.25) indicates position (28.5, 17.25) in the reference picture. To determine sample values at fractional offsets in the reference picture, the encoder typically interpolates between sample values at integer-sample positions. Such interpolation is referred to as “sub-pixel interpolation” and can be computationally intensive.
In general, motion compensation is a process of reconstructing pictures from reference picture(s) using candidate blocks from a reference picture selected during the motion estimation process. During motion compensation, a decoder also performs the sub-pixel interpolation as needed to compute sample values at fractional offsets in reference pictures.
Encoders typically spend a large proportion of encoding time performing motion estimation, attempting to find good matches from among multiple block candidates in a reference frame and thereby improving rate-distortion performance. Further, newer video codecs tend to employ higher-complexity sub-pixel interpolation schemes. Such higher-complexity sub-pixel interpolation schemes specified by a particular video codec can compound, or otherwise significantly increase, the encoding time and computational burden for motion estimation. This added complexity for motion estimation can be especially burdensome for applications or environments where speed is of importance, such as for real-time encoding environments like video conferencing or video encoding of live events.