A video codec may comprise an encoder which transforms input video into a compressed representation suitable for storage and/or transmission and a decoder that can uncompress the compressed video representation back into a viewable form, or either one of them. Typically, the encoder discards some information in the original video sequence in order to represent the video in a more compact form, for example at a lower bit rate.
Typical video codecs, operating for example according to the International Telecommunication Union's ITU-T H.263 and H.264 coding standards, encode video information in two phases. In the first phase, pixel values in a certain picture area or “block” are predicted. These pixel values can be predicted, for example, by motion compensation mechanisms, which involve finding and indicating an area in one of the previously encoded video frames (or a later coded video frame) that corresponds closely to the block being coded. Additionally, pixel values can be predicted by spatial mechanisms which involve finding and indicating a spatial region relationship.
Prediction approaches using image information from a previous (or a later) image can also be called as Inter prediction methods, and prediction approaches using image information within the same image can also be called as Intra prediction methods.
The second phase is one of coding the error between the predicted block of pixels and the original block of pixels. This is typically accomplished by transforming the difference in pixel values using a specified transform. This transform is typically a Discrete Cosine Transform (DCT) or a variant thereof. After transforming the difference, the transformed difference is quantized and entropy encoded.
By varying the fidelity of the quantization process, the encoder can control the balance between the accuracy of the pixel representation, (in other words, the quality of the picture) and the size of the resulting encoded video representation (in other words, the file size or transmission bit rate). An example of the encoding process is illustrated in FIG. 4a. 
The decoder reconstructs the output video by applying a prediction mechanism similar to that used by the encoder in order to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantized prediction error signal in the spatial domain).
After applying pixel prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the pixel values) to form the output video frame.
The decoder (and encoder) may also apply additional filtering processes in order to improve the quality of the output video before passing it for display and/or storing as a prediction reference for the forthcoming frames in the video sequence.
An example of the decoding process is illustrated in FIG. 6.
In typical video codecs, the motion information is indicated by motion vectors associated with each motion compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder) and the prediction source block in one of the previously coded or decoded images (or pictures). In order to represent motion vectors efficiently, motion vectors are typically coded differentially with respect to block specific predicted motion vector. In a typical video codec, the predicted motion vectors are created in a predefined way, for example by calculating the median of the encoded or decoded motion vectors of the adjacent blocks.
In typical video codecs the prediction residual after motion compensation is first transformed with a transform kernel (like DCT) and then coded. The reason for this is that often there still exists some correlation among the residual and transform can in many cases help reduce this correlation and provide more efficient coding.
Typical video encoders utilize the Lagrangian cost function to find optimal coding modes, for example the desired macro block mode and associated motion vectors. This type of cost function uses a weighting factor or □ to tie together the exact or estimated image distortion due to lossy coding methods and the exact or estimated amount of information required to represent the pixel values in an image area.
This may be represented by the equation:C=D+□R  (1)where C is the Lagrangian cost to be minimised, D is the image distortion (for example, the mean-squared error between the pixel values in original image block and in coded image block) with the mode and motion vectors currently considered, and R is the number of bits needed to represent the required data to reconstruct the image block in the decoder (including the amount of data to represent the candidate motion vectors).
Some hybrid video codecs, such as H.264/AVC, predict the Intra coded areas by spatial means utilizing the pixel values of the already processed areas in the picture. The difference between the predicted pixel values and the original ones is coded in a lossy manner utilizing DCT-like transform. Quantization of the transform coefficients may result in artefacts in the reconstructed video signal. These artefacts are especially visible if the transformed area or part of the transformed area has no high frequency content (that is, the pixel values are almost identical or change gradually over an area). Typical examples of such cases are human faces and sky. These are both characterized by gradual spatial changes in color which is not represented satisfactorily in the decoded video when the operation bitrate is moderately low (resulting in usage of moderate quantization of transform coefficients). The effect can be more severe when the amount of changes in the pixel values is smaller than what can be represented with the quantized AC coefficients of the transform. In this case the picture with gradually changing pixel values will look blocky as its decoded pixel values are represented with the DC coefficient of the transform alone.
In some known constructions the problem with blockiness of the decoded video signal is handled with filtering the image e.g. by using a Finite Impulse Response (FIR) filter. However due to complexity reasons some filters used in video coding utilize a short tap length, which may not be enough to smooth the blocking artefacts in smooth picture areas. This problem is illustrated in FIG. 8c. 