Studio video content producers often pre-edit uncompressed video to fit different requirements for channel distribution and format conversion. Another important and common edit step may be the color and film matching (also called analog support), which may require the injection of random noise into video frames to match characteristics of an analog film. Further, when the source video or film includes computer-generated graphics, a large amount of noise may be purposefully added in areas of uniform colors to create natural-looking effects.
The need for color dithering may arise because of a reduction in bit precision. An analog source material may be digitized with high bit precision, e.g., 14, 12, or 10 bits per channel (each channel representing luma or chroma in YUV or RGB) in a standard color space (4:4:4 in Society of Motion Picture and Engineers or SMPTE spec), while the final cut (uncompressed video) may be in a lower precision format, e.g., 8 bits per channel of e.g., YUV 4:2:2 or less. The conversion from high to low precision quantization may produce artifacts and require dithering to create virtual intermediate levels of colors not available in the final color space or bit depth. Finally, a global control of noise level, or noise modulation, may be used to create an overall visually persistent effect on a user, especially for situations such as luminance fading, dimming light, which are common in, e.g., titles or low light scenes. The final results of all this “in-house” processing may be given to distributors, e.g., an Apple iTunes store to be encoded in multiple formats, bitrates, or standards, e.g., H.264 for video compression.
Video materials having pre-edited content (even if in a 10 or 8 bit, non-compressed format) may create challenges to a block-based video encoder, e.g., a H.264 type encoder. For example, the effects of all these noise adding processes (for quality purpose) may affect multiple stages of an encoder including the motion estimation that matches blocks of images based on a sum of absolute difference (SAD) metrics.
A video encoder may cause compression artifacts of quantization and/or block artifacts. When compression artifacts are present in a video region that includes additive noise, they may become even more evident and visually annoying to a viewer because the geometrically defined structures of the compression artifacts may present in a random isotropic region. The persistent artifacts on a playback screen may create unnatural effects which may degrade the perceptual quality.
In low bit rate video encoding, the additive noise from film production may also make it more difficult to achieve high perceptual quality since high frequency noise may affect the quantization process and the rate distortion optimization (RDOPT). In fact, some techniques used to maximize the video quality in the final cut production may actually prevent low bit rate video encoding to achieve the maximum overall quality.
Since additive noise introduced from film post production may adversely affect the subsequent coding and decoding steps, it is advantageous to estimate the additive noise so that the additive noise may be properly treated in the subsequent coding and decoding steps.
When video is encoded at a very low bit rate (e.g., at 64 kilo bits per second (kbps) or 24 kbps) using coarse quantization, it may not be possible to encode all of the noise (or dithering) information in the source video since noise (including those quantization artifacts) tends to be information of high frequency and may have been removed during the quantization process. Since noise may affect components of the encoder (e.g., motion estimation, mode prediction decision, or rate distortion optimization), a preprocessing filter is often used to generate a filtered video frames. For example, the preprocessing filter may remove noise for improved coding efficiency. However, the preprocessing filter may also be filters other than denoise filter. For example, the filter may be a smooth filter or color correction filter.
Different denoise filters may be classified into spatial, temporal, or hybrid of spatial and temporal filters. Spatial denoise filters remove noise based on pixels in a single video frame. Temporal denoise filters remove noise based only on temporal information. Hybrid denoise filters use both spatial and temporal information to remove noise.
A type of commonly used hybrid denoise filters is motion compensated denoise filters in which trajectories of pixels are computed via, e.g., commonly known optical flow methods. For a particular video frame, motion compensated filters may average pixel values from previous and subsequent video frames based on the motion estimation to obtain a filtered version of the video frame. However, optical flow may be difficult to compute for it is an ill-posed mathematical problem. The solution of optical flow may require solving an optimization problem to achieve an accurate motion estimation. Under certain scenarios, e.g., scene changes or low lights, it may be even more difficult to accurately estimate optical flows.
The main issue with motion compensated denoise filters is that they tend to remove details in video frames and introduce artifacts or mismatches that may be perceived as unnatural by viewers. Therefore, there is a need for more efficient ways to generate naturally-looking denoised video.