The present invention relates generally to processing of video images and, in particular, to syntactic encoding of images for later compression by standard compression techniques.
There are many types of video signals, such as digital broadcast television (TV), video conferencing, interactive TV, etc. All of these signals, in their digital form, are divided into frames, each of which consists of many pixels (image elements), each of which requires 8-24 bits to describe them. The result is megabits of data per frame.
Before storing and/or transmitting these signals, they typically are compressed, urinal one of many standard video compression techniques, such as JPEG, MPEG, H-compression, etc. These compression standards use video signal transforms and intra- and inter-frame coding which exploit spatial and temporal correlations among pixels of a frame and across frames.
However, these compression techniques create a number of well-known, undesirable and unacceptable artifacts, such as blockiness, low resolution and wiggles, among others. These are particularly problematic for broadcast TV (satellite TV, cable TV, etc.) or for systems with very low bit rates (video conferencing, videophone).
Much research has been performed to try and improve the standard compression techniques. The following patents and articles discuss various prior art methods to do so:
U.S. Pat. Nos. 5,870,501, 5,847,766, 5,845,012, 5,796,884, 5,774,593, 5,586,200, 5,491,519, 5,341,442;
Raj Malluri et al, xe2x80x9cA Robust, Scalable, Object-Based Video Compression Technique for Very Low Bit-Rate Coding,xe2x80x9d IEEE Transactions of Circuit and System for Video Technology, vol. 7, No. 1, February 1997;
AwadKh. Al-Asmari, xe2x80x9cAn Adaptive Hybrid Coding Scheme for HDTV and Digital Sequences,xe2x80x9d IEEE Transactions on Consumer Electronics, vol. 42, No. 3, pp. 926-936, August 1995;
Kwok-tung Lo and Jian Feng, xe2x80x9cPredictive Mean Search Algorithms for Fast VQ Encoding of Images,xe2x80x9d IEEE Transactions On Consumer Electronics, vol. 41, No. 2, pp. 327-331, May 1995;
James Goel et al. xe2x80x9cPre-processing for MPEG Compression Using Adaptive Spatial Filteringxe2x80x9d, IEEE Transactions On Consumer Electronics, xe2x80x9cvol. 41, No. 3, pp. 687-698, August 1995;
Jian Feng et al. xe2x80x9cMotion Adaptive Classified Vector Quantization for ATM Video Codingxe2x80x9d, IEEE Transactions on Consumer Electronics, vol. 41, No. 2, p. 322-326, May 1995;
Austin Y. Lan et al., xe2x80x9cScene-Context Dependent Referencexe2x80x94Frame Placement for MPEG Video Coding,xe2x80x9d IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, No. 3, pp. 478-489, April 1999;
Kuo-Chin Fan, Kou-Sou Kan, xe2x80x9cAn Active Scene Analysis-Based approach for Pseudoconstant Bit-Rate Video Codingxe2x80x9d, IEEE Transactions on Circuits and Systems for Video Technology, vol. 8 No. 2, pp. 159-170, April 1998;
Takashi Ida and Yoko Sambansugi, xe2x80x9cImage Segmentation and Contour Detection Using Fractal Codingxe2x80x9d, IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, No. 8, pp. 968-975, December 1998;
Liang Shen and Rangaraj M. Rangayyan, xe2x80x9cA Segmentation-Based Lossless Image Coding Method for High-Resolution Medical Image Compression,xe2x80x9d IEEE Transactions on Medical Imaging, vol. 16, No. 3, pp. 301-316, June 1997;
Adrian Munteanu et al., xe2x80x9cWavelet-Based Lossless Compression of Coronary Angiographic Imagesxe2x80x9d, IEEE Transactions on Medical Imaging, vol. 18, No. 3, p. 272-281, March 1999; and
Akira Okumura et al., xe2x80x9cSignal Analysis and Compression Performance Evaluation of Pathological Microscopic Images,xe2x80x9d IEEE Transactions on Medical Imaging, vol. 16, No. 6, pp. 701-710, December 1997.
An object of the present invention is to provide a method and apparatus for video compression which is generally lossless vis-a-vis what the human eye perceives.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a visual lossless encoder for processing a video frame prior to compression by a video encoder. The encoder includes a threshold determination unit, a filter unit, an association unit and an altering unit. The threshold determination unit identifies a plurality of visual perception threshold levels to be associated with the pixels of the video frame, wherein the threshold levels define contrast levels above which a human eye can distinguish a pixel from among its neighboring pixels of the frame. The filter unit divides the video frame into portions having different detail dimensions. The association unit utilizes the threshold levels and the detail dimensions to associate the pixels of the video frame into subclasses. Each subclass includes pixels related to the same detail and which generally cannot be distinguished from each other. The altering unit alters the intensity of each pixel of the video frame according to its subclass.
Moreover, in accordance with a preferred embodiment of the present invention, the altering unit includes an inter-frame processor and an intra-frame processor.
Furthermore, in accordance with a preferred embodiment of the present invention, the intra-frame processor includes a controllable filter bank having a plurality of different filters and a filter selector which selects one of the filters for each pixel according to its subclass.
Further, in accordance with a preferred embodiment of the present invention, the inter-frame processor includes a low pass filter and a high pass filter operative on a difference frame between a current frame and a previous frame, large and small detail threshold elements for thresholding the filtered difference frame with a large detail threshold level and a small detail threshold level, respectively, and a summer which sums the outputs of the two filters as amended by the threshold elements.
Still further, in accordance with a preferred embodiment of the present invention, the threshold unit includes a unit for generating a plurality of parameters describing at least one of the following parameters: the volume of information in the frame, the per pixel color and the cross-frame change of intensity, and unit for generating the visual perception threshold from at least one of the parameters.
There is also provided, in accordance with a preferred embodiment of the present invention, a method of visual lossless encoding of frames of a video signal. The method includes steps of spatially and temporally separating and analyzing details of the frames, estimating parameters of the details, defining a visual perception threshold for each of the details in accordance with the estimated detail parameters, classifying the frame picture details into subclasses in accordance with the visual perception thresholds and the detail parameters and transforming each the frame detail in accordance with its associated subclass.
Additionally, in accordance with a preferred embodiment of the present invention, the step of separating and analyzing also includes the step of spatial high pass filtering of small dimension details and temporal filtering for detail motion analysis.
Moreover, in accordance with a preferred embodiment of the present invention, the step of estimating comprises at least one of the following steps,
determining Nxcex94i, a per-pixel signal intensity change between a current frame and a previous frame, normalized by a maximum intensity;
determining a NIXY, a normalized volume of intraframe change by high frequency filtering of the frame, summing the intensities of the filtered frame and normalizing the sum by the maximum possible amount of information within a frame;
generating NIF, a volume of inter-frame changes between a current frame and its previous frame normalized by a maximum possible amount of information volume within a frame;
generating NIGOP, a normalized volume of inter-frame changes for a group of pictures from the output of the previous step of generating;
evaluating a signal-to-noise ratio SNR by high pass filtering a difference frame between the current frame and the previous frame by selecting those intensities of the difference frame lower than a threshold defined as three times a noise level under which noise intensifies are not perceptible to the human eye, summing the intensities of the pixels in the filtered frame and normalizing the sum by the maximum intensity and the total number of pixels in the frame;
generating NYi, a normalized intensity value per-pixel;
generating a per-pixel color saturation level pi;
generating a per-pixel hue value hi; and
determining a per-pixel response Ri(hi) to the hue value.
Further, in accordance with a preferred embodiment of the present invention, the step of defining includes the step of producing the visual perception thresholds, per-pixel, from a minimum threshold value and at least one of the parameters.
Still further, in accordance with a preferred embodiment of the present invention, the step of defining includes the step of producing the visual perception thresholds, per-pixel, according to the following equation,             THD      i        =                                                      THD                              m                ⁢                                  xe2x80x83                                ⁢                i                ⁢                                  xe2x80x83                                ⁢                n                                      (                          1              +                                "AutoRightMatch"                ⁢        N        ⁢                  xe2x80x83                ⁢                  Δ          i                    +              NI        XY            +              NI        F            +              NI        GOP            +              NY        i            +              p        i            +              (                  1          -                                    R              i                        ⁡                          (                              h                i                            )                                      )            +              200        SNR              )
wherein THDmin is a minimum threshold level.
Moreover, in accordance with a preferred embodiment of the present invention, the step of classifying includes the steps of comparing multiple spatial high frequency levels of a pixel against its associated visual perception threshold and processing the comparison results to associate the pixel with one of the subclasses.
Further, in accordance with a preferred embodiment of the present invention, the step, of transforming includes the step of filtering each subclass with an associated two-dimensional low pass filter.
Still further, in accordance with a preferred embodiment of the present invention, the step of transforming includes the steps of generating a difference frame between the current frame and a previous transformed frame, low and high pass filtering of the difference frame, comparing the filtered frames with a large detail threshold and a small detail threshold and summing those portions of the filtered frames which are greater than the thresholds.
Additionally, in accordance with a preferred embodiment of the present invention, the large detail threshold is 2 to 5 percent.
Moreover, in accordance with a preferred embodiment of the present invention, the method includes the step of rounding the output of the step of transforming.
Finally, the intensity can be a luminance value or a chrominance value.