1. Field of the Invention
The present invention relates to a method and apparatus for the evaluation of the visual quality of processed digital video sequences. One common form of processing is compression to reduce the bit-rate of digital video. The invention can be used in various applications such as the automatic and continuous monitoring of processing of digital video sequences for transmission as High Definition Television (HDTV) or Direct Broadcast System (DBS) TV. More particularly, the present invention relates to a Digital Video Quality (DVQ) apparatus and method that incorporate a model of human visual sensitivity to predict the visibility of artifacts and the visual quality of processed video.
2. Description of Related Art
Considerable research has been conducted in the field of data compression, especially the compression of digital images. Digital images comprise a rapidly growing segment of the digital information stored and communicated by science, commerce, industry and government. Digital image transmission has gained significant importance in highly advanced television systems, such as high definition television using digital information. Because a relatively large number of digital bits are required to represent digital images, a difficult burden is placed on the infrastructure of the computer communication networks involved with the creation, transmission and re-creation of digital images. For this reason, there is a need to compress digital images to a smaller number of bits, by reducing redundancy and invisible image components of the images themselves.
A system that performs image compression is disclosed in U.S. Pat. No. 5,121,216 of Chen et al. and is incorporated herein by reference. The ""216 patent describes a transform coding algorithm for a still image, wherein the image is divided into small blocks of pixels. For example, each block of pixels can be either an 8xc3x978 or 16xc3x9716 block. Each block of pixels undergoes a two dimensional transform to produce a two dimensional array of transform coefficients. For still image coding applications, a Discrete Cosine Transform (DCT) is utilized to provide the transform.
In addition to the ""216 patent, the DCT is also employed in a number of current and future international standards, concerned with digital image compression, commonly referred to as JPEG and MPEG, which are acronyms for Joint Photographic Experts Group and Moving Pictures Experts Group, respectively. After a block of pixels of the ""216 patent undergoes a DCT, the resulting transform coefficients are subject to compression by thresholding and quantization operations. Thresholding involves setting all coefficients whose magnitude is smaller than a threshold value equal to zero, whereas quantization involves scaling a coefficient by step size and rounding off to the nearest integer.
Commonly, the quantization of each DCT coefficient is determined by an entry in a quantization matrix. It is this matrix that is primarily responsible for the perceived image quality and the bit rate of the transmission of the image. The perceived image quality is important because the human visual system can tolerate a certain amount of degradation of an image without being alerted to a noticeable error. Therefore, certain images can be transmitted at a low bit rate, whereas other images cannot tolerate degradation and should be transmitted at a higher bit rate in order to preserve their informational content.
The ""216 patent discloses a method for the compression of image information based on human visual sensitivity to quantization errors. In the method of ""216 patent, there is a quantization characteristic associated with block to block components of an image. This quantization characteristic is based on a busyness measurement of the image. The method of ""216 patent does not compute a complete quantization matrix, but rather a single scaler quantizer.
Recent years have seen the introduction and widespread acceptance of several varieties of digital video. These include digital television broadcasts from satellites (DBS-TV), the US Advanced Television System (ATV), digital movies on a compact disk (DVD), and digital video cassette recorders (DV). Such a trend is expected to continue in the near future and to expand to widespread terrestrial broadcast and cable distribution of digital television systems.
Most of these systems depend upon lossy compression of the video stream. Lossy compression can introduce visible artifacts, and indeed there is an economic incentive to reduce bit rate to the point where artifacts are almost visible. Compounding the problem is the xe2x80x9cburstyxe2x80x9d nature of digital video, which requires adaptive bit allocation based on visual quality metrics, and the economic need to reduce bit rate to the lowest level that yields acceptable quality.
For this reason, there is an urgent need for a reliable means to automatically evaluate the visibility of compression artifacts, and more generally, the visual quality of processed digital video sequences. Such a means is essential for the evaluation of codecs, for monitoring broadcast transmissions, and for ensuring the most efficient compression of sources and utilization of communication bandwidths.
The following references that are incorporated herein by reference, describe visual quality metrics for evaluating, controlling, and optimizing the quality of compressed still images, and incorporate simplified models of human visual sensitivity to spatial and chromatic visual signals:
A. B. Watson, xe2x80x9cImage Data Compression Having Minimum Perceptual Error,xe2x80x9d U.S. Pat. No. 5,629,780 (1997).
A. B. Watson, G. Y. Yang, J. A. Solomon, and J. Villasenor, xe2x80x9cVisibility of Wavelet Quantization Noise,xe2x80x9d IEEE Transactions on Image Processing, 6(8), 1164-1175 (1997).
A. B. Watson, xe2x80x9cPerceptual Optimization of DCT Color Quantization Matrices,xe2x80x9d IEEE International Conference on Image Processing, 1, 100-104 (1994).
A. B. Watson, xe2x80x9cImage Data Compression Having Minimum Perceptual Error,xe2x80x9d U.S. Pat. No. 5,426,512 (1995).
It would be desirable to extend the still image metrics described in the foregoing references to cover moving images. Most, if not all video quality metrics are inherently models of human vision. For example, if root-mean-squared-error (RMSE) is used as a quality metric, this amounts to the assumption that the human observer is sensitive to the summed squared deviations between reference and test sequences, and is insensitive to aspects such as the spatial frequency of the deviations, their temporal frequency, or their color. The DVQ metric is an attempt to incorporate many aspects of human visual sensitivity in a simple image processing algorithm. Simplicity is an important goal, since one would like the metric to run in real-time and require only modest computational resources.
A number of video quality metrics have been proposed in the following references:
K. T. Tan, M. Ghanbari, and D. E. Pearson, xe2x80x9cA Video Distortion Meter,xe2x80x9d Picture Coding Symposium, 119-122 (1997).
T. Hamada, S. Miyaji, and S. Matsumoto, xe2x80x9cPicture Quality Assessment System By Three-Layered Bottom-Up Noise Weighting Considering Human Visual Perception,xe2x80x9d Society of Motion Picture and Television Engineers, 179-192 (1997).
C. J. v. d. B. Lambrecht, xe2x80x9cColor Moving Pictures Quality Metric,xe2x80x9d International Conference on Image Processing, I, 885-888 (1996).
A. B. Watson, xe2x80x9cMultidimensional Pyramids In Vision And Video,xe2x80x9d Representations of Vision: Trends and Tacit Assumptions in Vision Research, A. Gorea, 17-26, Cambridge University Press, Cambridge (1991).
A. B. Watson, xe2x80x9cPerceptual-Components Architecture For Digital Video,xe2x80x9d Journal of the Optical Society of America A, 7(10), 1943-1954 (1990).
A. A. Webster, C. T. Jones, M. H. Pinson, S. D. Voran, and S. Wolf, xe2x80x9cAn Objective Video Quality Assessment System Based On Human Perception,xe2x80x9d Human Vision, Visual Processing, and Digital Display IV, SPIE Proceedings, 1913, 15-26 (1993).
J. Lubin, xe2x80x9cA Human Vision System Model for Objective Picture Quality Measurements,xe2x80x9d International Broadcasters"" Convention, Conference Publication of the International Broadcasters"" Convention, 498-503 (1997).
S. Wolf, M. H. Pinson, A. A. Webster, G. W. Cermak, and E. P. Tweedy, xe2x80x9cObjective And Subjective Measures Of MPEG Video Quality,xe2x80x9d Society of Motion Picture and Television Engineers, 160-178 (1997).
Some of the video quality metrics described in the foregoing references cover spatial filtering operations employed to implement the multiple, bandpass, spatial filters that are characteristic of human vision. A shortcoming of these video quality metrics is that that if the video quality metrics are not based closely enough upon human perception they can not accurately measure visual quality. Alternatively, if the video quality metrics are based closely upon human perception, they will require significant memory or computational resources that restrict the contexts in which they can be applied.
Therefore, there is still an unsatisfied need for a quality metric for digital video, which is reasonably accurate but computationally efficient.
A feature of the present invention is to provide a Digital Video Quality (DVQ) apparatus and method that incorporate a model of human visual sensitivity to predict the visibility of artifacts. The DVQ method and apparatus are used for the evaluation of the visual quality of processed or compressed digital video sequences, and for adaptively controlling the bit rate of the processed digital video sequences without compromising the visual quality. The DVQ apparatus minimizes the required amount of memory and computation.
The inventive Digital Video Quality (DVQ) apparatus can be widely used in various commercial applications including but not limited to satellite broadcasting of digital television (DBS-TV), movies on compact disc (DVD), high definition digital television (HDTV), digital video Camcorders (DV), Internet Video, digital terrestrial television broadcasting, and digital cable television distribution.
The present DVQ method offers significant advantages over conventional metrics in that the present DVQ method incorporates a reasonably accurate human vision model into a relatively simple processing architecture. A contributor to such architectural simplicity is the use of discrete cosine transforms (DCT) as a spatial filter bank, since the hardware and software to implement the DCT are widely available, due to its prevalence in most existing standards for video compression. Indeed, in some applications of the present DVQ method, the DCT may have already been computed as part of the digital video compression process.
Another contributor to the architectural simplicity of the present DVQ method is the use of Infinite Impulse Response (IIR) Filters in the temporal filtering stages. This reduces the amount of computation and memory required relative to other Finite Impulse Response (FIR) implementations.
The foregoing and other features and advantages of the present invention are achieved by a new DVQ apparatus and method. The input to the DVQ apparatus is a pair of color image sequences: the reference (R) or original non-compressed sequence, and the test (T) or processed sequence. Both sequences (R) and (T) are sampled, cropped, and subjected to color transformations. The sequences are then subjected to blocking and DCT transformation, and the results are transformed to local contrast. The next step is a time filtering operation which implements the human sensitivity to different time frequencies. The results are then converted to threshold units by dividing each DCT coefficient by its respective visual threshold. At the next stage the two sequences are subtracted to produce an error sequence. The error sequence is then subjected to a contrast masking operation, which also depends upon the reference sequence (R). The masked errors can be pooled in various ways to illustrate the perceptual error over various dimensions, and the pooled error can be converted to a visual quality (VQ) measure.