Digital video recording is quickly overtaking analog video recording in terms of popularity, in both consumer and corporate applications. Digital video recording can be used in corporate settings for surveillance, often providing 24-hour-a-day coverage, with a requirement to store a certain number of days of video footage. While analog video recording typically uses tape as the recording medium, digital video recording typically uses a hard drive or other computer-readable medium. If a network of digital video recorders is used, the issues of both storage resource consumption and bandwidth consumption for the transfer of the digital video recording need to be considered. As such, compression of multimedia data is an important issue, since it allows the system to employ less transmission bandwidth and/or hard disk space for the recordings.
The compression techniques used in the encoding of digital recordings can be classified as either lossy or lossless. Lossless techniques preserve data integrity, and are used in a situation where it is necessary to have an exact replica of the initially transmitted data after decompression. One such technique is the well-known Lepel-Ziv-Welsh (LZW) lossless compression algorithm which is considered to be effective due to its ability to remove redundant data. In lossy compression techniques, the goal is not 100% replication of the original data, but rather to convey the essence of the information contained in the original signal prior to compression. Lossy techniques are typically used for audio and visual (both still and motion video) data, neither of which typically have sufficient redundancy to allow for good compression ratios when lossless compression is employed. Lossy techniques allow the discarding of information determined to not be relevant to a viewer's (or listener's) perception of the data.
Commonly used lossy compression techniques include H.263, H.264, MPEG 4, and Motion JPEG for digital video, as well as JPEG compression for still pictures/images. Many of the video compression techniques encode only the changes between adjacent frames. Due to noise in the captured digital video image, large portions of a frame in a video sequence may be encoded due to the change in the image despite an observer regarding the frame as not holding any real information or content. Such noise can be due to factors such as lighting variations, sensor noise, etc. With traditional video compression algorithms, the areas encoded due to the presence of noise consume bits in the encoded bitstream as the algorithm tries to reproduce this noise in the encoded video sequence.
The percentage, or proportion, of the total bits used to encode non-content areas in a digital video sequence can depend on the following factors: 1) Quality of the encoded video: the higher the quality of the reproduced video, the more visible subtle changes in these areas of non-content become, this results in more bits being used to represent the frame; 2) Noise present in the scene/camera: lighting quality and/or camera quality can affect how much noise is produced, more bits are required to encode non-content areas in a given scene; and 3) Activity in the scene: the less activity or movement in the scene, the more dominant the areas of non-content in the image will become, fewer bits will be used for encoding content which makes the percentage of bits for non-content areas more significant.
A typical digital video system is illustrated in FIG. 1. From a given video source 10, video can be compressed in a video encoder 12 either directly (intra-frame) or differentially relative to the previously encoded video frame (inter-frame). In digital video compression, a key frame is a frame in which all of the information is recorded for every component of the frame. Interframes are frames in which only the difference from a chosen reference frame is transmitted or recorded. Interframes can be alternatively referred to as difference frames.
A typical video encoder 10 comprises the following processing blocks and functions. a) Used for inter-frame, motion estimation block 14 finds the best motion-compensated reference frame 26 to subtract from input frame. b) Transform block 16 converts the pixel values of the difference frame into frequency domain. Most methods use the discrete cosine transform (DCT), but other transforms can alternatively be used. c) Lossy compression occurs in quantizer block 18, typically by integer dividing the transform coefficients by a given quantizer step size. d) Entropy coding block 20 performs lossless compression of encoding symbols in as few bits as possible. Typical methods here are Huffman coding and arithmetic coding. e) Inverse quantizer block 22 reconstructs the coefficients by quantization weights. f) Inverse transform block 24 reconstructs pixel values that are then used to update the reference image or reference frame 26.
A compressed bitstream 28, containing the encoded video, is output by entropy coding block 20 and transmitted to video decoder 30, which decodes the received bitstream to facilitate viewing of the encoded video. Optional intermediate steps can include storing the video in a storage device 32 and/or sending it over a network. The decoder 30 performs the inverse procedure of the encoder as shown by the blocks labelled 14i, 16i, 18i, 20i, and 26i in order to reconstruct a frame that can then be viewed.
As specified earlier, a problem with this typical video system is that areas where the only changes to the frame are attributable to noise consume excessive space. This is due to the fact that any region whose DCT coefficients are quantized to non-zero values will be encoded into the bitstream and thus consume bandwidth and require additional storage resources. Existing methods to remedy this problem usually attempt to suppress the noise across all regions in the frame equally. Such methods include spatial and/or temporal filters applied to the video frames prior to encoding to suppress the changes observed in the image due to noise. These methods are generally expensive in CPU usage and/or memory and generally have detrimental impact on video quality. Spatial filtering of a video frame typically reduces the resolution of the image in areas of content as well as non-content, negatively impacting the quality. Temporal filtering of a video frame sequence typically adds ghosting artifacts in areas of legitimate changes in the video sequence. Some methods exist to try to limit the artifacts of such methods, but these can be relatively complex.
When considering digital video compression, the ability to distinguish between changes in frames due to relevant content and the presence of noise is of significant benefit. For instance, if a security guard were to watch a video of a person walking down a hallway, the details relating to the physical features of the person would be much more important to the security guard than an exact representation of a particular shade of paint on the static walls in the hallway. When observing changing frames in a digital video sequence, the information that is not changing over time is generally not important to the eye.
It is also known to determine if a particular pixel block of an image has changed. U.S. Pat. No. 6,006,276 to MacCormack et al., the contents of which are incorporated herein by reference in their entirety, describes such an approach. MacCormack generates compressed image data for blocks in which a change in the DC component of the block is detected. Such an approach requires a comparison between the coefficient data from the current image and the corresponding coefficient data from a related reference image. Also, reference image data must be stored to make the comparison. MacCormack also suggests comparing the content of neighbouring blocks, but limits such comparison to blocks within the same coding unit.
It is, therefore, desirable to provide better compression based on content discrimination, in order to reduce disk storage space for storage of digital video, or to reduce bandwidth requirements for transmission of digital video, while retaining picture quality in regions of content.