Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A “codec” is an encoder/decoder system.
Compression can be lossless, in which the quality of the video does not suffer, but decreases in bit rate are limited by the inherent amount of variability (sometimes called source entropy) of the input video data. Or, compression can be lossy, in which the quality of the video suffers, and the lost quality cannot be completely recovered, but achievable decreases in bit rate are more dramatic. Lossy compression is often used in conjunction with lossless compression—lossy compression establishes an approximation of information, and the lossless compression is applied to represent the approximation.
Quantization and other lossy processing can result in visible lines at boundaries between blocks or sub-blocks of a picture. Such “blocking artifacts” might occur, for example, if adjacent blocks in a smoothly changing region of a picture (such as a sky area) are quantized to different average levels. Blocking artifacts can be especially troublesome in pictures that are used as reference pictures for motion compensation processes during encoding and decoding. To reduce blocking artifacts, an encoder and decoder can use “deblock” filtering to smooth boundary discontinuities between blocks and/or sub-blocks in reference pictures. The filtering is “in-loop” in that it occurs inside a motion-compensation loop—the encoder and decoder perform it on reference pictures used later in encoding/decoding. Deblock filtering typically improves the quality of motion compensation, resulting in better motion-compensated prediction and lower bitrate for prediction residuals, thereby increasing coding efficiency. For this reason, in-loop deblock filtering is usually enabled during encoding, in which case a decoder also performs in-loop deblock filtering for correct decoding. A decoder may also perform “post-processing” deblock filtering on pictures output by the decoder, outside of the motion-compensation loop.
Various video standards and products incorporate in-loop deblock filtering. The details of the filtering vary depending on the standard or product, and can be quite complex. Even within a standard or product, the rules of applying deblock filtering can vary depending on factors such as content/smoothness, motion vectors for blocks/sub-blocks on different sides of a boundary, block/sub-block size, coded/not coded status (e.g., whether transform coefficient information is signaled in the bitstream), and progressive/interlaced field/interlaced frame mode. For example, FIG. 1 shows some block/sub-block boundaries when an encoder and decoder perform in-loop filtering in a motion-compensated progressive video frame. The encoder and decoder use transforms of varying size (8×8, 8×4, 4×8 or 4×4). A shaded block/sub-block indicates the block/sub-block is coded. Thick lines represent the boundaries that are adaptively filtered, and thin lines represent the boundaries that are not filtered. The boundary between a given block/sub-block and a neighboring block/sub-block may or may not be adaptively filtered. Generally, a boundary between a given block/sub-block and a neighboring block/sub-block is filtered unless both are inter-coded, both have the same motion vector, and both are not coded (lack transform coefficient information in the bitstream).
Video encoding and decoding are very computationally intensive, and in-loop deblock filtering is relatively computationally intensive even compared to other video encoding and decoding operations. This computational intensity can be problematic in various scenarios, such as decoding of high-quality high-bit rate video (e.g., for high-definition video). Some decoders use video acceleration to offload selected computationally intensive operations to a graphics processor. For example, a decoder uses the primary central processing unit as a host to control overall decoding and uses a graphics processor to perform repeated operations that collectively involve extensive computation. In particular, the decoder uses the graphics processor to perform filtering operations on pixel values of multiple lines or multiple blocks in parallel for in-loop deblock filtering. This low-level parallelism can be efficient in certain scenarios. Some computing devices lack a graphics processor, however, or are not configured to use the graphics processor for decoding, or are unable to use the graphics processor for decoding because it is occupied with other operations.
On the other hand, the number of processing cores available to computing systems grows nearly every year. To take advantage of multiple threads available on multi-core machines, some encoders and decoders use multi-threading to improve encoding/decoding performance. For multi-threading, operations are split into tasks that can be performed with different threads. For example, for decoding, different tasks can be used for entropy decoding, inverse frequency transforms and motion compensation, respectively. In some cases, different tasks can be performed in parallel, which improves performance. In other cases, the performance of one task is dependent on the completion of another task.