1. Field of the Invention
This invention relates to image coding and more specifically to a source coder and method of allocating quantization bits based upon the spatial and temporal masking effects of a liquid crystal display (LCD) to improve perceptual quality.
2. Description of the Related Art
Image coding is used to compress the total number of bits used to represent a digital image, still or moving, while maintaining the quality of the reconstructed and displayed image. Image coding is used in broadcast systems to encode a video signal that is transmitted over a limited bandwidth channel, decoded and displayed. Image coding is also used to compress still or video imagery for storage purposes. The amount of bandwidth required to transmit a signal and the amount of memory needed to store a signal are directly reflected in the cost of the system. However, the desire to reduce bandwidth or memory by increasing the amount of compression must be balanced against the need to maintain a high quality signal. The balance will greatly depend upon the system constraints and the desired quality.
Most compression techniques use either predictive or transform coding algorithms. Predictive algorithms are the simplest but provide the poorest performance. Transform based coding algorithms such as the Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), the Karhonen-Loeve (KL) transform, or Wavelet transform, for example, provide far superior performance than predictive coders but are more complex.
In general, transform algorithms decorrelate the image by projecting it onto a set of orthogonal basis functions and then quantize the coefficients. The decoder performs an inverse transform on the quantized coefficients to produce a reconstructed image. In video systems, the frame-to-frame correlation is also removed and the residual image is encoded. To do this effectively, the coder must estimate and compensate for any frame-to-frame motion.
The existing Motion Picture Expert Group (MPEG) standards for coding digital video data use the DCT with motion estimation and compensation. By intelligently allocating bits to the transform coefficients, the quality of the reconstructed image for a given bit rate can be optimized or, alternately, the number of bits required for a desired quality can be minimized. Quality is typically measured in terms of distortion, i.e. the signal-to-noise ratio (SNR) of the reconstructed image, or perceptual quality, i.e. how good the displayed image appears to a viewer.
The classical solution to the bit allocation is posed as: EQU D=D(b)=.SIGMA.W.sub.i (b.sub.i)g.sub.i for i=1 to k (1)
subject to the constraint that EQU .SIGMA.b.sub.i .ltoreq.B for i=1 to k (2)
where D is the overall distortion, W.sub.i (b.sub.i) is the distortion incurred in optimally quantizing the i.sup.th transform coefficient with b.sub.i bit, g.sub.i is a weighting coefficient, and B is the total number of available bits. To optimize SNR, the weighting coefficients g.sub.i are all set to one such that the distortion in each transform coefficient is given equal weight. As a result, under the optimal allocation each transform coefficient incurs the same average distortion.
However, it is well known that SNR and visual quality are not perfectly correlated. The human visual system (HVS) resembles a bandpass filter that is less sensitive to very low and high frequencies. Therefore systems that maximize SNR do not necessarily optimize image quality. The preferred, but currently unsolved approach, would be to find a better distortion measure that mirrors perceived quality. A simpler approach is to select the weighting coefficients gi based on the modulation transfer function (MTF) of the HVS to de-emphasize distortion in very low and high frequency transform coefficients. Although this will reduce the SNR of the reconstructed image, the perceived quality of the image should improve. U.S. Pat. No 4,780,761 describes a source coder in which the transform coefficients are quantized according to a two-dimensional model of the sensitivity of the human visual system.
The standard display device for viewing still and video imagery and for evaluating the perceived quality of the coding algorithm has been the cathode ray tube (CRT). Known CRTs exhibit an 8-bit color depth and provide an emissive and contiguous, i.e. non-tesselated, display that turns the individual phosphors on and off very quickly, on the order of ns, such that the reconstructed image exhibits minimal after-image effects such as blurring or ghosting. Typically, the CRT's electron spot is "defocused" by increasing the aperture size to avoid aliasing problems. Any effect the CRT might have on perceived quality is incorporated into the coding algorithm in the psychovisual responses of the test subjects. Their responses may be used to fine tune the HVS model or bit allocation algorithm, but are not directly included as a MTF in the coding algorithm. In recent years, advances in liquid crystal display (LCD) technology has improved picture quality to the point that flat panel displays have become the platform of choice for many applications such as lap top computers, multimedia and video conferencing. The LCD display is a passive, multi-line driven or active matrix addressed non-emissive tesselated display that exhibits response times orders of magnitude slower than current CRT technology and a color depth that is smaller, typically 6 bits.
The tesselated display is caused by the discrete nature of the LCD pixels, which must be electrically and hence physically isolated from each other. In low cost LCDs used in instrumentation equipment, white light passes directly through this tesselation pattern and is perceived as a white matrix. In higher quality LCDs used in video systems, the tesselation pattern is coated with a black substance to provide a good black state and hence decent contrast ratio. In either case, the viewer perceives this matrix overlayed on the image. James Larimer, "Visual Effects of the Black Matrix in Tessellated Displays," SID 95 DIGEST, 99. 49-51 discusses the effects in detail.
The LCD's sluggish response time causes noticeable after-image effects such as image sticking, ghosting or motion blur. For example, dragging the mouse arrow across the LCD of a lap top computer will produce one or more of these effects. The LCD's color depth is limited by the cost of providing a high resolution voltage driver to switch the LCD. The lack of color depth, the tesselated display, and sluggish response times are viewed as limitations on flat panel displays to be overcome through advances in LCD technology. Specifically, improving the voltage driver, increasing the LCD's aperture ratio, and reducing response time.