Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A “codec” is an encoder/decoder system.
Over the last two decades, various video codec standards have been adopted, including the ITU-T H.261, H.262 (MPEG-2 or ISO/IEC 13818-2), H.263 and H.264 (MPEG-4 AVC or ISO/IEC 14496-10) standards, the MPEG-1 (ISO/IEC 11172-2) and MPEG-4 Visual (ISO/IEC 14496-2) standards, and the SMPTE 421M (VC-1) standard. More recently, the H.265/HEVC standard (ITU-T H.265 or ISO/IEC 23008-2) has been approved. Extensions to the H.265/HEVC standard (e.g., for scalable video coding/decoding, for coding/decoding of video with higher fidelity in terms of sample bit depth or chroma sampling rate, for screen capture content, or for multi-view coding/decoding) are currently under development. A video codec standard typically defines options for the syntax of an encoded video bitstream, detailing parameters in the bitstream when particular features are used in encoding and decoding. In many cases, a video codec standard also provides details about the decoding operations a decoder should perform to achieve conforming results in decoding. Aside from codec standards, various proprietary codec formats define other options for the syntax of an encoded video bitstream and corresponding decoding operations.
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression. Intra-picture compression techniques compress individual pictures, and inter-picture compression techniques compress pictures with reference to a preceding and/or following picture (often called a reference or anchor picture) or pictures.
Inter-picture compression techniques often use motion estimation and motion compensation to reduce bit rate by exploiting temporal redundancy in a video sequence. Motion estimation is a process for estimating motion between pictures. In one common technique, an encoder using motion estimation attempts to match a current block of sample values in a current picture with a candidate block of the same size in a search area in another picture, the reference picture. A reference picture is, in general, a picture that contains sample values that may be used for prediction in the decoding process of other pictures.
For a current block, when the encoder finds an exact or “close enough” match in the search area in the reference picture, the encoder parameterizes the change in position between the current and candidate blocks as motion data such as a motion vector (“MV”). An MV is conventionally a two-dimensional value, having a horizontal MV component that indicates left or right spatial displacement and a vertical MV component that indicates up or down spatial displacement. In general, motion compensation is a process of reconstructing pictures from reference picture(s) using motion data.
I. MV Precision.
An MV can indicate a spatial displacement in terms of an integer number of samples starting from a co-located position in a reference picture for a current block. For example, for a current block at position (32, 16) in a current picture, the MV (−3, 1) indicates position (29, 17) in the reference picture. Or, an MV can indicate a spatial displacement in terms of a fractional number of samples from a co-located position in a reference picture for a current block. For example, for a current block at position (32, 16) in a current picture, the MV (−3.5, 1.25) indicates position (28.5, 17.25) in the reference picture. To determine sample values at fractional offsets in the reference picture, the encoder typically interpolates between sample values at integer-sample positions. Such interpolation can be computationally intensive. During motion compensation, a decoder also performs the interpolation as needed to compute sample values at fractional offsets in reference pictures.
When encoding a block using motion estimation and motion compensation, an encoder often computes the sample-by-sample differences (also called residual values or error values) between the sample values of the block and its motion-compensated prediction. The residual values may then be encoded. For the residual values, encoding efficiency depends on the complexity of the residual values and how much loss or distortion is introduced as part of the compression process. In general, a good motion-compensated prediction closely approximates a block, such that the residual values include few significant values, and the residual values can be efficiently encoded. On the other hand, a poor motion-compensated prediction often yields residual values that include many significant values, which are more difficult to encode efficiently. Encoders typically spend a large proportion of encoding time performing motion estimation, attempting to find good matches and thereby improve rate-distortion performance.
Different video codec standards and formats have used MVs with different MV precisions. For integer-sample MV precision, an MV component indicates an integer number of sample values for spatial displacement. For a fractional-sample MV precision such as ½-sample MV precision or ¼-sample MV precision, an MV component can indicate an integer number of sample values or fractional number of sample values for spatial displacement. For example, if the MV precision is ¼-sample MV precision, an MV component can indicate a spatial displacement of 0 samples, 0.25 samples, 0.5 samples, 0.75 samples, 1.0 samples, 1.25 samples, and so on. When a codec uses MVs with integer-sample MV precision, an encoder and decoder need not perform interpolation operations between sample values of reference pictures for motion compensation. When a codec uses MVs with fractional-sample MV precision, an encoder and decoder perform interpolation operations between sample values of reference pictures for motion compensation (adding computational complexity), but motion-compensated predictions tend to more closely approximate blocks (leading to residual values with fewer significant values), compared to integer-sample MV precision.
Some video codec standards and formats support switching of MV precision during encoding. Encoder-side decisions about which MV precision to use are not made effectively, however, in certain encoding scenarios. In particular, such encoder-side decisions are not made effectively in various situations when encoding artificially-created video content such as screen capture content.
II. Reference Picture Sets.
In some video codec standards and formats, multiple reference pictures are available at a given time for use for motion-compensated prediction. Such video codec standards/formats specify how to manage the multiple reference pictures. For example, reference pictures can be added or dropped automatically according to rules during video encoding and decoding. Or, parameters in a bitstream may indicate information about reference pictures used during video encoding and decoding.
In some video codec standards and formats, a reference picture set (“RPS”) is a set of reference pictures available for use in motion-compensated prediction at a given time. During encoding and decoding, an RPS can be updated to add newly decoded pictures and remove older pictures that are no longer used as reference pictures. In some recent codec standards (such as the H.265/HEVC standard), an RPS is updated during encoding and decoding, and syntax elements signaled in the bitstream indicate how to update the RPS.
Encoder-side decisions about how to update an RPS are not made effectively in certain encoding scenarios, however. In particular, such decisions are not made effectively in various situations when encoding artificially-created video content such as screen capture content.
III. Sample Adaptive Offset Filtering.
A video encoder or video decoder can apply one or more filters to reconstructed sample values of pictures. According to the H.265/HEVC standard, for example, deblock filtering and sample adaptive offset (“SAO”) filtering can be applied to reconstructed sample values. Deblock filtering tends to reduce blocking artifacts due to block-based coding, and is adaptively applied to sample values at block boundaries. Within a region, SAO filtering is adaptively applied to sample values that satisfy certain conditions, such as presence of a gradient across the sample values.
According to the H.265/HEVC standard, SAO filtering can be enabled or disabled for a sequence. When enabled for a sequence, SAO filtering can be enabled or disabled on a slice-by-slice basis for luma content of a slice and/or for chroma content of the slice. SAO filtering can also be enabled or disabled for blocks within a slice. For example, SAO filtering can be enabled or disabled for coding tree blocks (“CTBs”) of a coding tree unit (“CTU”) in a slice, where a CTU typically includes a luma CTB and corresponding chroma CTBs. For a CTB, a type index indicates whether SAO filtering is disabled, uses band offsets, or uses edge offsets. If SAO filtering uses band offsets or edge offsets, additional syntax elements indicate parameters for the SAO filtering for the CTB. In some cases, a CTB can reuse syntax elements from an adjacent CTB to control SAO filtering. In any event, when SAO filtering is used, it increases the computational complexity of encoding and decoding.
There are many conditions and situations in which SAO filtering should be disabled. Encoder-side decisions about when to use SAO filtering are not made effectively, however, in certain encoding scenarios. In particular, such decisions are not made effectively in various situations when encoding artificially-created video content such as screen capture content.