In digital video systems, such as network camera monitoring systems, video sequences are compressed before transmission using various video encoding methods. In many digital video encoding systems, two main modes are used for compressing video frames of a sequence of video frames: intra mode and inter mode. In the intra mode, the luminance and chrominance channels are encoded by exploiting the spatial redundancy of the pixels in a given channel of a single frame via prediction, transform, and entropy coding. The encoded frames are called intra-frames, and may also be referred to as I-frames. Within an intra-frame, blocks of pixels, also referred to as macroblocks, are encoded in intra-mode, meaning that they are encoded with reference to a similar block within the same image frame, or raw coded with no reference at all. The inter mode instead exploits the temporal redundancy between separate frames, and relies on a motion-compensation prediction technique that predicts parts of a frame from one or more previous frames by encoding the motion in pixels from one frame to another for selected blocks of pixels. The encoded frames are called inter-frames, and may be referred to as P-frames (forward-predicted frames), which can refer to previous frames in decoding order, or B-frames (bi-directionally predicted frames), which can refer to two or more previously decoded frames, and can have any arbitrary display-order relationship of the frames used for the prediction. Within an inter-frame, blocks of pixels, also referred to as macroblocks, may be encoded either in inter-mode, meaning that they are encoded with reference to a similar block in a previously decoded image, or in intra-mode, meaning that they are encoded with reference to a similar block within the same image frame, or raw-coded with no reference at all.
At times, there is a lot of noise in the captured images. This is particularly the case in low-light conditions, such as at dusk or dawn. In such low-light conditions, long exposure times and high gain are needed, leading to increased noise, or in other words a lowered signal-to-noise ratio (SNR). Since a significant portion of the noise is dynamic, it will vary from one frame to another. This presents challenges for inter-frame encoding, because the noise will mean that even if a particular macroblock in an image that is going to be encoded depicts the same part of the captured scene as the corresponding macroblock in a previously encoded and decoded image frame used as reference frame, that macroblock will appear different. This may lead to large residuals when encoding the macroblock, which in turn implies a high output bitrate. The differences in appearance may also make it more difficult to find a suitable macroblock to refer to in the reference frame, leading to a longer search. Sometimes, no matching macroblock is found before the end of the predetermined search pattern, such that the current macroblock has to be encoded in intra mode, also leading to an increased output bitrate.
There is consequently an interest in reducing the amount of noise in the images before encoding. Various solutions are known, in which spatial and/or temporal noise filters are applied to the images before encoding. Many of these solutions may lead to satisfactory results, but some are computationally intense, and may not be useful for real-time encoding, such as for monitoring or surveillance purposes. There is thus still a need for methods and systems for reducing the impact of noise on encoding.