1. Field of the Invention
The present invention relates generally to digital video processing and, more particularly, to digital video compression.
2. Discussion of Prior Art
Data reduction occurs during various stages of digital video encoding. However, quantizationxe2x80x94which provides one of the best data compression opportunitiesxe2x80x94is also perhaps the least well-understood.
A typical video encoder receives source data having an initial spatial resolution. Prior to actual coding, the source data is mapped to a typically lower resolution sampling grid (xe2x80x9cdown-sampledxe2x80x9d), filtered and then analyzed for statistical coding metrics according to which coding is then conducted. During coding, an encode-subsystem compresses the pre-processed data, typically using conversion, quantization and other processing to modify successive pictures (e.g. frames, blocks, objects, etc.).
In MPEG-2, for example, block-based motion-compensated prediction enables the use of not only complete picture representations (i.e. intra or I-pictures), but also predicted (P and B) pictures represented by predicted intra-picture motion (xe2x80x9cprediction dataxe2x80x9d) and predicted-versus-actual picture or xe2x80x9cprediction errorxe2x80x9d data. The prediction error data is then converted using a discrete cosine transform or xe2x80x9cDCTxe2x80x9d and then quantized. During quantization, additional bitrate reduction is achieved by replacing higher resolution pictures with lower resolution (lower-bitrate) quantized pictures. Final coding and other processing also provide incremental data optimization.
While several factors can influence the bitrate that is devoted to each picture (e.g. using a corresponding quantization step size), a particularly promising one is perceptual masking. That is, the sensitivity of the human visual system (xe2x80x9cHVSxe2x80x9d) to distortion tends to vary in the presence of certain spatio-temporal picture attributes. It should therefore be possible to model the HVS perceptual masking characteristics in terms of spatio-temporal picture attributes. It should also be possible to determine appropriate quantization step-sizes for received pictures (e.g. in order to achieve a desired quality and/or bitrate) by analyzing the pictures, determining perceptually significant picture attributes and then applying the perceptual model.
The current understanding of perceptual masking is, however, limited and the HVS is considered so complex and the perception of quality so subjective as to elude accurate modeling. See, for example, Digital Images and Human Vision, MIT Press (1993); MPEG Video Compression Standard, Chapman and Hall (1996), and Digital Video: An Introduction to MPEG-2, Chapman and Hall (1997). Nevertheless, attempts have been made to provide some degree of perceptual modeling in order to exploit HVS perceptual masking effects.
For example, many encoders now incorporate a quantizer that modifies or xe2x80x9cadaptsxe2x80x9d a rate-control based nominal quantization step size according to a block energy measurement. FIG. 1, for example, broadly illustrates a typical adaptive quantizer within an MPEG encoder. During quantization, rate controller 101 transfers to quantization-modifier 102 a nominal quantization value QNom, macroblock data and a macroblock-type parameter. Quantization-modifier 102 processes the macroblock data, typically using sum of differences from DC (xe2x80x9cSDDCxe2x80x9d) or variance techniques, and then transfers to quantizer 103 a modified quantization value, MQuant.
Within quantization-modifier 102, formatter 121 organizes each received macroblock into 4 blocks, each block containing an 8-row by 8-column array of pixel values, p(r,c), according to the received (frame-or-field) type parameters. Next, block energy analyzers 122a-d perform an SDDC (or variance based) block energy analysis for each of the blocks, as given by equations 1 or 2 respectively:                               SDDC          ⁡                      (            block            )                          =                              ∑                          r              ,                              c                =                0                                      7                    ⁢                      "LeftBracketingBar"                                          p                ⁡                                  (                                      r                    ,                    c                                    )                                            -                              mean                ⁢                                  -                                ⁢                                  p                  ⁡                                      (                    block                    )                                                                        "RightBracketingBar"                                              Equation        ⁢                  xe2x80x83                ⁢        1        ⁢                  :                                                  Variance          ⁡                      (            block            )                          =                              ∑                          r              ,                              c                =                0                                      7                    ⁢                                                    (                                                      p                    ⁡                                          (                                              r                        ,                        c                                            )                                                        -                                      mean                    ⁢                                          -                                        ⁢                                          p                      ⁡                                              (                        block                        )                                                                                            )                            2                        .                                              Equation        ⁢                  xe2x80x83                ⁢        2        ⁢                  :                    
Each block energy analyzer further maps the total block energy measure for a current block to a corresponding modification value according to equation 3,
Block quantization mod=(xcex1xc3x97a+mean(a))/(a+xcex1xc3x97mean(a))xe2x80x83xe2x80x83Equation 3:
wherein xe2x80x9cxcex1xe2x80x9d is a multiplier (typically equal to 2) and xe2x80x9caxe2x80x9d is the minimum block-SDDC or variance in a macroblock. Minimizer 123 next determines the minimum block quantization modification. The resultant minimum is then multiplied by QNom to produce MQuant, which is transferred to quantizer 103.
Unfortunately, such a block-energy perceptual model provides only a rough approximation of how distortion generally tends to perceptually blend into a picture; it does not necessarily result in a minimized or well-distributed bitrate, and resulting decoded video often exhibits so-called halo effects, mosquito noise and other artifacts. Attempts to improve reliabilityxe2x80x94typically by formatting macroblocks in a finer 16xc3x9716 block arrayxe2x80x94not only substantially increase processing and storage requirements, but also provide only limited improvement.
Other HVS models have also been attempted. For example, one method attempts to detect characters (e.g. alphanumerics) that are particularly sensitive to distortion and then, when detected, to add appropriate xe2x80x9cspecial casexe2x80x9d quantization modifications to an existing perceptual masking model. Unfortunately, despite the sometimes extensive resources currently required for added detection and compensation, no commercially available encoder appears to include an accurate working HVS model, let alone an economically feasible one.
Accordingly, there remains a need for apparatus and methods capable of modeling the HVS and of enabling accurate and efficient video quantization in accordance with perceptual masking.
The present invention provides for accurate and efficient perceptually adaptive picture quantization and, among other capabilities, enables lower more optimally distributed bitrate video compression.
In one aspect, embodiments of the invention provide a perceptual model found to enable the determination of perceptual masking effects in a modifiable, yet accurate manner. In another aspect, received picture data and/or other information can be analyzed in accordance with perceptually significant picture attributes. Low-resource edge detection, as well as activity, luminance, temporal and positional perceptual significance determination and correlation (e.g. selection, combination, correspondence, etc.) are also enabled. Also provided are multiple-granularity (i.e. resolution, dimension, attribute, etc.) analysis and correlation, which are preferably used to produce perceptually-based quantization modifications.
In a preferred embodiment, adaptive quantization is provided within an MPEG-2 encoder integrated circuit (xe2x80x9cICxe2x80x9d). Received picture data is analyzed to determine energy and edge attribute indicators and a multiple granularity correlation of the energy and edge attribute indicators is conducted to provide an activity-based quantization modification or xe2x80x9cactivity-modification.xe2x80x9d The received picture data is also analyzed for luminance-sensitivity, and a resulting luminance-modification is correlated with the activity-modification and a further nominal-quantization offset (e.g. reflecting temporal-masking effects) to produce an intermediate modification. The intermediate modification is then limited. Finally, a positional-sensitivity determination is formed as a perimeter offset, which is correlated with the limited intermediate modification. The resulting positionally adapted modification is then rounded or truncated to produce a quantization modification, which is used by a quantizer in performing quantization.
Advantageously, embodiments of the present invention enable effective and efficient perceptual analysis, perceptual modeling, correlation and adaptive quantization. Such capability can, for example, be utilized to facilitate low-bitrate substantially transparent image compression, enhancement and other processing, as well as modifiable processing in accordance with varying image sources, types, identifiable portions and attributes, among other application parameters.
These and other object and advantages of the present invention will become apparent to those skilled in the art after considering the following detailed specification together with the accompanying drawings.