This invention relates to compression of images, particularly sequences of digitized color video images.
The JPEG-2000 Bi-Orthogonal 9/7 Discrete Wavelet Transform
For two decades, band-split technologies such as sub-band coding, low-pass/high-pass split pairs, and wavelet sub-band codings, have been applied to image compression. Recently notable is the sub-band discrete wavelet transforms (DWT) used in JPEG-2000 (see, for example, “JPEG2000, Image Compression Fundamentals, Standards, and Practice” by David S. Taubman and Michael W. Marcellin, Kluwer Academic Publishers 2002). The JPEG-2000 still image and intra-coded (i.e., no motion compensation) moving image coding system supports two “bi-orthogonal” wavelet classes in a sub-hand configuration. A DWT 5/3 bi-orthogonal subband configuration is used for lossless compression, when exact bit match is required, but with only a small amount (typically 2.2:1) of compression. A DWT 9/7 bi-orthogonal subband configuration is more generally useful, and can provide a transform coding method for higher compression ratios, while preserving the “visual essence” of the image (although not bit-exact).
The fundamental merit of the DWT 9/7 bi-orthogonal subband configuration is the resemblance to a low-pass/high-pass filter pair. The “bi-orthogonality” refers to odd and even sample locations using low and high pass filters. respectively. This structure is then split into 4 sub-bands in JPEG-2000, with low horizontal and vertical (“low-low”), high horizontal and low vertical (“high-low”), low horizontal and high vertical (“low-high”), and high vertical and horizontal (“high-high”) subbands. This subband configuration can also utilize other band-split filter sets, and need not be structured bi-orthogonally at even/odd pixels. Any defined low hand up-filter and high-band sum (with optional high-hand filter) can yield a band-split suitable for use in compression coding.
FIG. 1 is a block diagram of a prior art 9/7 DWT bi-orthogonal subband compression system in accordance with the teachings of JPEG-2000. A higher resolution image 100 to be compressed (or a previous higher layer low-low subband image) is filtered down by a low band filter 112 (shown as having 9 taps) applied to odd pixels and a high band filter 114 (shown as having 7 taps) applied to even pixels, generating 4 subband images 120. These analytical filters 112, 114 are first applied in a horizontal pass, creating intermediate horizontal low and horizontal high subbands. These two intermediate subbands are then filtered in a vertical pass. Vertically filtering the horizontal low subband with the same analytical filters 112, 114 results in a low-low subband and a high-low subband. Vertically filtering the horizontal high subband with the same analytical filters 112, 114 results in a low-high subband and a high-high subband. During synthesis of an image from the 4 subbands, a low band filter 122 (shown as having 7 taps) is applied to odd pixels and a high band filter 124 (shown as having 9 taps) is applied to even pixels.
Band-split low-pass/high-pass filter pairs are most effective in separating spatial frequency energy. The bi-orthogonal DWT 9/7 is sufficiently similar to a low-pass/high-pass filter pair that it functions effectively. For idealized samples, the optimal low-pass filter is a truncated sine function (i.e., sinc(x)=sin(x)/x) with the distance between the filter center and the first zero crossing being equal to the low-pass pixel spacing. For an octave (factor of two reduction in resolution) band-split, the spacing from filter center to the first zero crossing (in both directions) is 2.0 in source resolution units, and 1.0 in the half-octave result resolution units. The low-pass filter of the DWT 9/7 roughly resembles this octave truncated sine, although its dimensions and weights differ somewhat. While idealized linear samples never occur in practice, they form the basis of image processing theory, such as Nyquist sampling and filtering. Note that theory uses a sine of infinite extent, which can be truncated in actual practice. A truncated sine is not ideal according to filter theory because it is truncated and typically non-linear, and because the samples are not ideally filtered when they are created or reconstructed. However, a truncated sine is as close as possible to optimal in most image filtering applications.
Quantization
A part of most image compression coding is the use of quantization. A “quantization parameter”, often known by its initials “QP”, is divided into localized frequency coefficients in essentially every common type of non-lossless compression system. To reconstruct a compressed image, the frequency coefficients are re-multiplied by the appropriate quantization parameter. Because of the integer nature of the quantized values, the reconstructed coefficients with vary by ±half of the value of a step in the quantization parameter. For example, if the quantization parameter is 6, the reconstructed value will typically vary ±3. Further, in order to increase the number of zero coefficients, which code most efficiently, a “deadband” is usually applied around zero. Thus, for example, even with a quantization parameter of 6, the value of 0 in a coefficient may span the range of ±6 (rather than ±3 without any deadband).
In JPEG-2000, the quantization parameter may be specified for each subband with a single floating-point value. The JPEG-2000 deadband is fixed at double the width of the quantization step. JPEG-2000 also uses hit truncation methods to reduce coded bits which remove some small quantized values, even though the quantization was non-zero. Because of this, JPEG-2000 does not compress only with quantization, but also with the coded location of bits, often resulting in a relatively random additional error, above and beyond quantization error.
Coefficient Coding Structure
The coding of frequency coefficients is typically a lossless process involving some fixed structure and a variable-length coding (VLC) method (such as Huffman or arithmetic coding). It is typical for the coefficient coding structure to match the transform. For example, in MPEG-2, the structure of the coefficient coding is identical to the Discrete Cosine Transform (DCT) 8×8 pixel block. A pattern of variable length code is applied to the order of values in an 8×8 block, such as zig-zag from the corner, or a left-to-right, top-to-bottom scan. In JPEG-2000, the DWT 9/7 is coded up from the root coefficient or bottom resolution to each four-fold expansion of coefficients creating the sub-bands (low-low, high-low, low-high, and high-high). The coefficients are then synthesized into the next higher resolution, and become the low-low subband of the next layer up.
Variable Length Coding
Variable length codes used in image compression range from extremely simple, such as run-length codes and delta codes, to moderately complex such as arithmetic codes. The purpose of the variable length code is to reduce the number of bits necessary to code the coefficient values compared to using a fixed number of bits capable of coding the maximum range. For example, if 16 bits are used because the values can range between ±32767, but only a few values are larger than ±127, then 8-bits could be used with one “escape” code reserved to indicate that the next value needs an additional 16-bits. Although the large “escaped” value then needs 24 bits (8+16), it is usually infrequent enough that the average coefficient coding size will be nearer to 8-bits than 24-bits. This methodology can be extended based upon the principle that very small values, and even zero itself, are much more likely than larger values of any size. In this way, a Huffman table attempts to use the shortest codes for small and likely values, and gradually longer codes for larger and less likely values.
The arithmetic coding methodology allows multiple code values to be coded together, resulting in codes which have non-integer numbers of average bits for each coefficient code value. For example, two values may be coded with 7 bits, such that each code value uses the equivalent of 3½ bits each.
It is typical in compression systems such as MPEG-2 and JPEG-2000 to combine run-length, delta, and Huffman codes.
Motion Compensation
JPEG-2000 does not offer motion compensation, since every frame stands alone. This is known as “intra” coding. MPEG-2, and many other similar coding systems offer motion compensation, using blocks and motion vectors, for “inter” coding of images in a sequence of images. In such motion compensated coding systems, it is common practice to structure the motion blocks as a superset of the transform coding blocks. For example, in MPEG-2, the motion blocks are typically 16×16 pixels in size (16×8 for interlace), which encompasses four 8×8 DCT blocks (two for interlace). Thus, the block motion compensation structure is closely fitted to the DCT transform coding structure. MPEG-4, both as part 2 (original MPEG-4 video) and part 10 (also called the “Advanced Video Coder”), are structured similarly to MPEG-2 in these aspects.
Spatial Scalability
MPEG-2 offers a rarely used “spatial scalable” option which allows an additional resolution increasing layer to be coded. The up-filter for this option differs greatly from the theoretically optimal truncated sinc. MPEG-2 also offers signal-to-noise-reduction (SNR) scalability, which is also rarely used. The basic structure of the SNR level of MPEG-2 is identical to basic MPEG-2—summing a correction to improve signal to noise in the resulting image. Neither spatial scalability nor SNR scalability are targeted at any specific goals, only general increase in resolution and SNR, respectively. Only a single SNR and a single spatial scalability level are defined in MPEG-2.
JPEG-2000 offers the ability to prioritize and compartmentalize the order of bits in an image, such that early termination of decoding is possible (if the image is encoded with prioritization and/or compartmentalization). This allows a method of scalability for either SNR or resolution enhancement, but is limited by the hit-plane ordered coding and block-region compartmentalization properties of JPEG-2000. For example, transformed pixels of higher priority can be pre-shifted left during quantization (thus scaling by powers of two), and un-shifted during decoding, to provide a limited form of SNR scalability (limited since a left shift scaling must be a power of two). All highest significance (most significant bit) bit planes (within a tile partition or other regional compartment) are decoded first, then the next highest bit, etc., until the decoder truncates prior to decoding all of the available bits, or until all coded bits have been decoded. This method of ordered coding and optional pre-shifting allows some spatial and SNR scalability, but is limited to stopping within the boundaries a specific pre-ordered bit plane. Thus, the scalability available within JPEG-2000 is limited to bit-planes sharing a common QP, which are by their nature separated by a factor of two in significance. Finer granularity of scalability is not possible in JPEG-2000.
Floating Point
It has been common practice to mix floating-point and integer computations in reference compression coder software. For example, MPEG-2 and MPEG-4 use floating point reference DCT implementations, but integer processing for color processing, motion compensation, and most other aspects of the coding systems. JPEG-2000 uses a combination of integer and floating-point processing in its reference implementation. MPEG-4 part 10 uses an “integer transform” which combines the quantization and DCT transform steps into a single integer operation. Although the MPEG-4 part 10 implementation is not bit-exact invertible, the integer decoding is intended to exactly match between the encoder and decoder. This is a design feature of motion-compensated coding systems which the current inventor (along with David RuhoIf) has filed as patent application number 20020154693, entitled “High Precision Encoding and Decoding of Video images”. The use of “exact match” decoding (that is, exactly matching between the decoder portion of the encoder, and all bitstream decoders) allows limited precision integer computations to be used without propagating errors when using motion compensation.
Some integer processing has been an essential ingredient of most if not all previous compression coding systems. This has been intentional, since floating-point computation has usually been substantially slower than integer computation, especially 16-bit and 8-bit integer computation.
OpenExr
Relatively recently, Lucasfilm Industrial Light and Magic (a digital special-effects production company) and Nvidia (a maker of video cards for personal computers) have teamed up to create a standard known as “OpenExr”. OpenExr is an open “Extended Range” floating point representation featuring a 16-bit floating point representation having a sign bit, a 5-bit exponent, and a 10-bit mantissa. This representation provides sufficient precision for most image processing applications, as well as allowing an extended range for white and black. The 16-bit “half” floating representation provided by OpenExr can be directly mapped to standard 32-bit IEEE floating point representation for easy interoperability.
It is common practice to display pixel values with black at zero, and white at the maximum integer value, or at a floating value of 1.0. However, digital image masters, especially those involving computer graphics, often need to represent a wider range of white and dark than is available with 8-bit or 10-bit integers having black at 0 and white at 255 or 1023, respectively. OpenExr allows white values and black values to extend substantially beyond this range. Further, concatenated computations will have higher resulting precision when using OpenExr16-bit floating point representation, or 32-bit floating point representation, than integer computations (even when using integer computations with exact-match decoding).
OpenExr has the further benefit of allowing direct representation of linear light values, rather than requiring a non-linear (usually a video-gamma exponent) representation when using integers for pixel values.
OpenExr also offers an optional lossless compression coder (usually yielding 2:1 compression). This compression coder is based upon a combination of the simple Haar difference wavelet, a reduced-precision clustering table, and a Huffman variable-length code. The reduced-precision clustering table increases compression if many of the code values are not used. For example, such is the case if converting from 10-bit integer pixel values, since only 1023 codes (of the possible 65536) codes would be used.