A video sequence comprises a series of images. Each image in the series of images comprises a plurality of picture elements (e.g., pixels) conveniently referred to as a frame.
A typical signal present in a decoder of a digital communications system contains distortion. Such distortion may be due to a number of different factors. For example, whenever a signal is quantized in a coder, quantizing distortion is introduced into the signal to be transmitted. Other factors may also introduce distortion into the signal to be decoded.
Digital communications systems used for video commonly include filters to help mitigate the effects of distortion, particularly when low bit rate video coding is used. The demand for good quality, low-bit rate video coding is growing and will continue to grow with the use of video over existing and future networks including ISDN and cable video distribution systems. Noise present in the video sequence at the receiver and/or decoder owing to quantization error is inevitable, especially in low bit rate video coding (currently illustrated by systems operating at 384 kbps or less). In order to reduce noise, post-processing of the reconstructed video sequence has proven useful particularly for current low bit rate algorithms.
Appropriately configured spatial and temporal post-filters are known to help alleviate the effects of distortion. This is because the intensities or characteristics of the pixels are spatially and temporally correlated.
A temporal post-filter is dependent upon time (e.g., the time ordered position of the current image or frame with respect to other images or frames in a sequence of images or frames) and is thus related to video, which is time dependent. It should be noted that "frame," "image," and "image frame" are used interchangeably herein. In other words, a temporal post-filter typically uses information from more than one frame to filter a single frame. This type of information makes a temporal post-filter well suited for post-filtering frames that have undergone interframe coding (i.e., the coding of a frame is dependent upon information in other frames). On the other hand, a spatial post-filter is dependent upon the location of a particular pixel or set of pixels for a particular single frame. Thus, spatial post-filters typically are not time dependent and do not typically rely on information from frames other than the current frame.
Temporal filtering is frequently used for interframe coding. The simplest and most straightforward temporal filter is a frame-averaging algorithm. However, such filters tend to degrade the appearance of moving objects. Hence, it is known to use motion compensation combined with a temporal filter to improve the quality of a video sequence. In M. I. Sezan, M. K. Ozkan and S. V. Fogel, "Temporally Adaptive Filtering of Noisy Image Sequences Using a Robust Motion Estimation Algorithm," Proc. ICASSP, pp. 2429-2432, 1991 and D. S. Kalivas and A. A. Sawchuk, "Motion Compensated Enhancement of a Noisy Image," Proc. ICASSP, pp. 2121-2124, 1990, a motion estimation algorithm is applied to a noisy image sequence to estimate the motion trajectories, i.e., locations of the pixels making up the image that correspond to each other for a pre-determined number of successive image frames. Then, the intensity of a particular pixel at a current frame is estimated using the noisy image sequence intensities that are on the motion trajectory transversing that pixel. This can be done by using pixels from previous and subsequent frames that relate to the particular pixel of the current frame. The algorithm segments the video into moving and stationary components. Then, an adaptive temporal filter is applied to the components. In many image sequences, motion can be a complex combination of translation and rotation. Such motion is difficult to estimate and may require a large amount of processing. In J. M. Boyce, "Noise Reduction of Image Sequences Using Adaptive Motion Compensated Frame Averaging," Proc. ICASSP pp. III.461-III.464, 1992 and C. H. Lee, B. S. Jeng, R. H. Ju, H. C. Huang, K. S. Kan, J. S. Huang and T. S. Liu, "Postprocessing of Video Sequences Using Motion Dependent Median Filters," Proc. SPIE on Visual Communications and Image Processing. pp. 728-734, Boston, 1991, motion compensation is formed by using block matching (e.g., assigning a value to selected groups or blocks of pixels). Subsequently, frame averaging with motion compensation is applied (as described in J. M. Boyce, cited above) and median filtering with motion compensation is applied (as described in C. H. Lee et al., cited above).
Spatial filtering is also frequently used to help alleviate the effects of distortion. This is because some artifacts such as blocking and contouring last for a few frames temporally, making it very difficult to reduce those artifacts by using a temporal filter. These artifacts are the result of grouping or blocking pixels together during the encoding process. "Blocking" as used in the previous sentence refers to the grouping together of pixels during encoding. However, the "blocking" type of artifact refers to a physical result of grouping pixels. Typically, pixels that are grouped are represented by a single intensity (in the case of non-color encoding), for instance GRAY 153 on an eight bit scale from zero (white) to 255 (black). Pixels in the next group may be represented by GRAY 154. The human eye sees a smooth transition from GRAY 153 to GRAY 154 and thus, blocking and contouring artifacts do not exist. Blocking artifacts are typically reflected by adjacent groups of pixels appearing to be tiled (e.g., separate areas, akin to floor tiles). Contouring artifacts would exist if the adjacent groups of pixels appear to have a boundary or edge between them. The likelihood that these artifacts will exist increases if a low bit rate is used. Thus if only six bits (i.e., 64 values) were used to represent the intensity range from 0 (white) to 255 (black), i.e., only values 0, 4, 8, 12, . . . , 248, 252 were used, then GRAY 153 would become GRAY 152 and GRAY 154 would become GRAY 156. This larger difference has a much greater chance of being perceived by the human eye, and thus, may result in blocking and/or contouring.
A spatial filter was used in V. Ramamoorthy, "Removal of Staircase Effects in Coarsely Quantized Video Sequences," Proc. ICASSP pp. III.309-III.312, 1992 to reduce another distorting factor known as the staircase effect. The algorithm in the Ramamoorthy paper uses edge detection to classify sub-blocks of pixels into two classes: edge and non-edge. Subsequently, a median filter and a so-called D-filter, described, e.g., in A. Kundu and W. R. Wu, "Double-Window Hodges-Lehman (D) Filter and Hybrid D-Median Filter for Robust Image Smoothing," IEEE Trans. Acoust., Speech and Signal Processing, Vol. ASSP-37, No. 8, pp. 1293-1298, August 1989, are applied to edge and non-edge pixels respectively.
Though first used to code speech signals, see R. E. Crochiere, S. A. Webber and J. L. Flanagan, "Digital Coding of Speech in Subbands," Bell Syst. Tech. J., Vol. 55, pp. 1069-1085, October 1976, the subband coding concept is based on the decomposition of the image into different frequency subbands and the coding of each subband separately according to its statistics. Subband coding of images in two dimensions is described in J. W. Woods and S. D. O'Neil, "Subband Coding of Images," IEEE Trans. Acoust., Speech and Signal Processing, Vol. ASSP-34, pp. 1278-1288, October 1986 and H. Gharavi and A. Tabatabai, "Subband Coding of Monochrome and Color Images," IEEE Trans. Circuit and Systems, Vol. CAS-35, pp. 207-214, February 1988. Three dimensional subband coding of images is described in G. Karlsson and M. Vitterli, "Three Dimensional Subband Coding of Video," Proc. ICASSP, pp. 1100-1103, New York, 1988 and C. I. Podilchuk, N. S. Jayant and P. Noll, "Sparse Codebooks for the Quantization of Non-Dominant Subbands in Image Coding," Proc. ICASSP, pp. 2101-2104, Albuquerque 1990.
Recently, geometric vector quantization in the context of a full motion video coder based on a three dimensional sub-band framework was illustrated in U.S. Pat. No. 5,136,374, issued to N. S. Jayant and C. I. Podilchuk on Aug. 4, 1992, which patent is hereby incorporated by reference herein as if set forth in its entirety. The advantages of such an approach are the confinement of coding errors to individual subbands (if the quantization of subband signals is fine enough), and noise spectrum shaping due to varying bit assignment in the subbands. A very effective two-dimensional subband coder for still image compression has been developed based on perceptual modelling as described in R. J. Safranek and J. D. Johnston, "A Perceptually Tuned Subband Image Coder with Image Dependent Quantization and Post-Quantization," Proc. ICASSP, 1989 and U.S. patent application Ser. No. 08/098,561 filed Jul. 26, 1993 claiming priority to, ultimately, U.S. patent application Ser. No. 07/350,435 filed May 4, 1989, now abandoned, and assigned to the assignee of the present application. Others have made a three-dimensional subband coder for full motion video. By decomposing the data into different subbands, the 3-D system encodes the motion and spatial details by coding the relevant subband data. In contrast, the more traditional video coding technique based on motion compensation determines motion parameters by matching block data from one block to the next; the blocks where motion compensation fails are coded using a discrete cosine transform (DCT), the use of which is known to produce a blocky type of distortion especially pronounced at lower bit rates.
By way of comparison, subband coders sometimes introduce distortion in the form of high frequency artifacts or blurring of edges at low bit rates due to having too few bits for the encoding of the high frequency details. Distortion based on temporal filtering is often less bothersome perceptually than distortion introduced by motion compensation, though geometric vector quantizers in 3-D systems can also introduce blocky artifacts at low bit rates.