Conventional video cameras capture image frames and process the captured image frames prior to encoding. The encoded image frames are typically provided in the form of a bit stream which may be sent over a network to a decoder. In such video encoding/decoding systems, it is often desirable to keep the bitrate as low as possible in order to minimize the load on the network and the storage space required to store the encoded video. However, at the same time it is of course desirable to maintain a high video quality with as few artefacts as possible.
Known video coding techniques, such as MPEG-4 and H.264, use inter-frame prediction to reduce video data between a series of frames. This involves techniques such as block-based motion compensation, where a new frame can be predicted block by block by looking for a matching block in a reference frame.
With inter-frame prediction, each frame is classified as a certain type of frame, such as an intra-frame (sometimes referred to as an I-frame, e.g., in H.264) or an inter-frame (sometimes referred to as a P-frame or B-frame, e.g., in H.264). An intra-frame is a self-contained frame that can be independently decoded without reference to any other frames. This is in contrast to an inter-frame which make reference to one or more previously decoded frames.
The intra-frames and the inter-frames are arranged in a certain order in the video stream as defined by a group of pictures (GOP) structure. An intra-frame indicates the beginning of a GOP structure, and thereafter several inter-frames follow. When a decoder encounters a new GOP structure in the bit stream, it does not need any previously decoded frames in order to decode the following frames. When decoding a GOP structure, the decoder will first decode the intra-frame at the beginning of the GOP structure since the intra-frame can be decoded without reference to any other frame. Then the decoder proceeds to decode the next frame in the decoding order, which will be an inter-frame, using the decoded intra-frame as a reference frame. The decoder then proceeds to successively decode inter-frames using one or more of the decoded intra-frame and the previously decoded inter-frames of the GOP structure as reference frames until a new intra-frame indicating the beginning of a new GOP structure is encountered in the bit stream. The intra-frame at the beginning of a GOP structure thus serves as a base reference image for decoding the following inter-frames, since the following inter-frames directly, or indirectly via another reference frame, use the decoded intra-frame as a reference frame.
When encoding an inter-frame, blocks of pixels in the inter-frame are compared to blocks of a reference frames so as to estimate motion vectors, i.e., vectors which describe the motion of the blocks in relation to the reference frames. This comparison typically includes comparing individual pixel values of a block of the inter-frame to individual pixel values of a number of blocks in the reference frame, and selecting the block in the reference frame that gives the best match. As a result of the comparison of individual pixel values, the level of noise in the image frames highly affects the accuracy of the motion vector estimation. In the end, this will have a negative impact on the quality of the encoded video—there will be more artefacts in the video—as well as the bitrate of the encoded video.
However, the solution to this problem is not as simple as just reducing the noise in the image frames, since noise reduction comes at the penalty of introducing blur, both in the temporal domain and the spatial domain, which in turn has a negative impact on the resulting video quality. There is thus a delicate trade-off between improving video quality by improving the motion vector estimates and reducing the video quality by introduction of blur.