The present invention relates to data processing and, more particularly, to data compression, for example as applied to still and video images, speech and music. A major objective of the present invention is to enhance collaborative video applications over heterogeneous networks of inexpensive general purpose computers.
As computers are becoming vehicles of human interaction, the demand is rising for the interaction to be more immediate and complete. Where text-based e-mail and database services predominated on local networks and on the Internet, the effort is on to provide such data intensive services such as collaborative video applications, e.g., video conferencing and interactive video.
In most cases, the raw data requirements for such applications far exceed available bandwidth, so data compression is necessary to meet the demand. Effectiveness is a goal of any image compression scheme. Speed is a requirement imposed by collaborative applications to provide an immediacy to interaction. Scalability is a requirement imposed by the heterogeneity of networks and computers.
Effectiveness can be measured in terms of the amount of distortion resulting for a given degree of compression. The distortion can be expressed in terms of the square of the difference between corresponding pixels averaged over the image, i.e., mean square error (less is better). The mean square error can be: 1) weighted, for example, to take variations in perceptual sensitivity into account; or 2) unweighted.
The extent of compression can be measured either as a compression ratio or a bit rate. The compression ratio (more is better) is the number of bits of an input value divided by the number of bits in the expression of that value in the compressed code (averaged over a large number of input values if the code is variable length). The bit rate is the number of bits of compressed code required to represent an input value. Compression effectiveness can be characterized by a plot of distortion as a function of bit rate.
Ideally, there would be zero distortion, and there are lossless compression techniques that achieve this. However, lossless compression techniques tend to be limited to compression ratios of about 2, whereas compression ratios of 20 to 500 are desired for collaborative video applications. Lossy compression techniques always result in some distortion. However, the distortion can be acceptable, even imperceptible, while much greater compression is achieved.
Collaborative video is desired for communication between general purpose computers over heterogeneous networks, including analog phone lines, digital phone lines, and local-area networks. Encoding and decoding are often computationally intensive and thus can introduce latencies or bottlenecks in the data stream. Often dedicated hardware is required to accelerate encoding and decoding. However, requiring dedicated hardware greatly reduces the market for collaborative video applications. For collaborative video, fast, software-based compression would be highly desirable.
Heterogeneous networks of general purpose computers present a wide range of channel capacities and decoding capabilities. One approach would be to compress image data more than once and to different degrees for the different channels and computers. However, this is burdensome on the encoding end and provides no flexibility for different computing power on the receiving end. A better solution is to compress image data into a low-compression/low distortion code that is readily scalable to greater compression at the expense of greater distortion.
State-of-the-art compression schemes have been promulgated as standards by an international Motion Picture Experts Group; the current standards are MPEG-1 and MPEG-2. These standards are well suited for applications involving playback of video encoded off-line. For example, they are well suited to playback of CD-ROM and DVD disks. However, compression effectiveness is non-optimal, encoding requirements are excessive, and scalability is too limited. These limitations can be better understood with the following explanation.
Most compression schemes operate on digital images that are expressed as a two-dimensional array of picture elements (pixels) each with one (as in a monochrome or gray-scale image) or more (as in a color image) values assigned to each pixel. Commonly, a color image is treated as a superposition of three independent monochrome images for purposes of compression.
The lossy compression techniques practically required for video compression generally involve quantization applied to monochrome (gray-scale or color component) images. In quantization, a high-precision image description is converted to a low-precision image description, typically through a many-to-one mapping. Quantization techniques can be divided into scalar quantization (SQ) techniques and vector quantization (VQ) techniques. While scalars can be considered one-dimensional vectors, there are important qualitative distinctions between the two quantization techniques.
Vector quantization can be used to process an image in blocks, which are represented as vectors in an n-dimensional space. In most monochrome photographic images, adjacent pixels are likely to be close in intensity. Vector quantization can take advantage of this fact by assigning more representative vectors to regions of the n-dimensional space in which adjacent pixels are close in intensity than to regions of the n-dimensional space in which adjacent pixels are very different in intensity. In a comparable scalar quantization scheme, each pixel would be compressed independently; no advantage is taken of the correlations between adjacent pixels. While, scalar quantization techniques can be modified at the expense of additional computations to take advantage of correlations, comparable modifications can be applied to vector quantization. Overall, vector quantization provides for more effective compression than does scalar quantization.
Another difference between vector and scalar quantization is how the representative values or vectors are represented in the compressed data. In scalar quantization, the compressed data can include reduced precision expressions of the representative values. Such a representation can be readily scaled simply by removing one or more least-significant bits from the representative value. In more sophisticated scalar quantization techniques, the representative values are represented by indices; however, scaling can still take advantage of the fact that the representative values have a given order in a metric dimension. In vector quantization, representative vectors are distributed in an n-dimensional space. Where n greater than 1, there is no natural order to the representative vectors. Accordingly, they are assigned effectively arbitrary indices. There is no simple and effective way to manipulate these indices to make the compression scalable.
The final distinction between vector and scalar quantization is more quantitative than qualitative. The computations required for quantization scale dramatically (more than linearly) with the number of pixels involved in a computation. In scalar quantization, one pixel is processed at a time. In vector-quantization, plural pixels are processed at once. In the case, of popular 4xc3x974 and 8xc3x978 block sizes, the number of pixels processed at once becomes 16 and 64, respectively. To achieve minimal distortion, xe2x80x9cfull-searchxe2x80x9d vector quantization computes the distances in an n-dimensional space of an image vector from each representative vector Accordingly, vector quantization tends to be much slower than scalar quantization and, therefore, limited to off-line compression applications.
Because of its greater effectiveness, considerable effort has been directed to accelerating vector quantization by eliminating some of the computations required. There are structured alternatives to xe2x80x9cfull-searchxe2x80x9d VQ that reduce the number of computations required per input block at the expense of a small increase in distortion. Structured VQ techniques perform comparisons in an ordered manner so as to exclude apparently unnecessary comparisons. All such techniques involve some risk that the closest comparison will not be found. However, the risk is not large and the consequence typically is that a second closest point is selected when the first closest point is not. While the net distortion is larger than with full search VQ, it is typically better than scalar VQ performed on each dimension separately.
In xe2x80x9ctree-structuredxe2x80x9d VQ, comparisons are performed in pairs. For example, the first two measurements can involve codebook points in symmetrical positions in the upper and the lower halves of a vector space. If an image input vector is closer to the upper codebook point, no further comparisons with codebook points in the lower half of the space are performed. Tree-structured VQ works best when the codebook has certain symmetries. However, requiring these symmetries reduces the flexibility of codebook design so that the resulting codebook is not optimal for minimizing distortion. Furthermore, while reduced, the computations required by tree-structured VQ can be excessive for collaborative video applications.
In table-based vector quantization (TBVQ), the assignment of all possible blocks to codebook vectors is pre-computed and represented in a lookup table. No computations are required during image compression. However, in the case of 4xc3x974 blocks of pixels, with eight-bits allotted to characterize each pixel, the number of table addresses would be 25616, which is clearly impractical. Hierarchical table-based vector quantization (HTBVQ) separates a vector quantization table into stages; this effectively reduces the memory requirements, but at a cost of additional distortion.
Further, it is well known that the pixel space in which images are originally expressed is often not the best for vector quantization. Vector quantization is most effective when the dimensions differ in perceptual significance. However, in pixel space, the perceptual significance of the dimensions (which merely represent different pixel positions in a block) does not vary. Accordingly, vector quantization is typically preceded by a transform such as a wavelet transform. Thus, the value of eliminating computations during vector quantization is impaired if computations are required for transformation prior to quantization. While some work has been done integrating a wavelet transform into a HTBVQ table, the resulting effectiveness has not been satisfactory.
It is recognized that hardware accelerators can be used to improve the encoding rate of data compression systems. However, this solution is expensive. More importantly, it is awkward from a distribution standpoint. On the Internet, images and Web Pages are presented in many different formats, each requiring their own viewer or xe2x80x9cbrowserxe2x80x9d. To reach the largest possible audience without relying on a lowest common denominator viewing technology, image providers can download viewing applications to prospective consumers. Obviously, this download distribution system would not be applicable for hardware based encoders. If encoders for collaborative video are to be downloadable, they must be fast enough for real-time operation in software implementations. Where the applications involve collaborative video over heterogeneous networks of general purpose computers, there is still a need for a downloadable compression scheme that provides a more optimal combination of effectiveness, speed, and scalability.
The present invention provides for data compression using a hierarchical table implementing a block transform and outputting a variable-rate, embedded code. There are several aspects of the invention that are brought together to achieve optimal benefits, but which can be used separately.
A counterintuitive aspect of the present invention is the incorporation of a codebook of a type used for structured vector quantization in a compression table. Structured vector quantization is designed to reduce the computations required for compression while accepting a small increase in distortion relative to full-search vector quantization. However, this tradeoff is a poor one in the context of tables, since all the computations are pre-computed.
In the present case, a codebook design procedure used for tree-structured vector quantization is used, not to reduce computations, but to provide a codebook that can be mapped readily to an embedded code. In an embedded code, bits are arranged in order of significance. When the least significant bit of a multi-bit index to a first codebook vector is dropped, the result is an index of a codebook vector near the first codebook vector. Thus, an embedded code is readily scaled to provide a variable-rate system.
An embedded code can readily be made variable length to minimize entropy and reduce the bit rate for a net gain in compression effectiveness. Thus, any loss of effectiveness resulting from the use of a structured vector quantization codebook is at least partially offset by the gain in compression effectiveness resulting from the use of a variable-length code.
Another aspect of the invention is the implementation of block transforms in the table. Block transforms can express data so that information can be separated by significance. This makes it feasible to apply more compression to less significant data for a net gain in the apparent effectiveness of the compression.
In the case of image or other sensory data compression, if the space to which the data is transformed is not perceptually linear, a perceptually weighted proximity measure can be used during codebook design. In accordance with the present invention, an unweighted or less perceptually weighted proximity measure should be used during a table fill-in procedure to minimize distortion.
A further aspect of the invention is the incorporation of considerations other than perceptually weighted or unweighted proximity measures in codebook design. For example, entropy constraints can be imposed on codebook design to enhance bit rate. In the (greedy) growing of a decision tree, a joint entropy and distortion measure can be used to select nodes to be grown or pruned. If the joint measure is applied on a node-by-node basis, virtually continuous scalability can be provided while maintaining high compression effectiveness at each available bit rate.
A final aspect of the invention takes advantage of the lower memory requirements afforded by hierarchical tables. Hierarchical tables raise the issue of how to incorporate structures, constraints, and transforms in a table. In the case of the block transforms, the transforms are used in codebook design at every stage of the table. However, in the case of structures and constraints used to provide variable-length codes, these are best restricted to design of the last-stage table only.
It is not necessary for all aspects of the invention to be practiced together to attain advantages. However, when combined to yield a table-based data compression system with a variable-rate embedded code, the result is optimally suited for collaborative video applications. Scalability at both the encoding and decoding ends is provided by the embedded code. Speed is provided by the use of tables in which everything is pre-computed; by using the hierarchical tables, memory requirements can be made reasonable. Compression effectiveness is enhanced by incorporated block transforms and entropy considerations into codebook design. Thus, the compression is suitable for software only applications; thus, the compression scheme can be distributed over networks to make collaborative video applications widely available. These and other features and advantages of the invention are apparent from the description below with reference to the following drawing.