The present invention relates to digital data compression, and in particular to a scheme for temporal coherent image data compression, and more particularly to a self-adaptive compression scheme for motion digital video compression.
With the convergence of digital information in the home, a need has arisen for the integration of home computers with other information appliances. In co-pending application Ser. Nos. 08/792,003 and 08/792,361, both filed Jan. 31, 1997, and assigned to the Assignee of the present invention, an exemplary digital wireless home network was described. The network has at its heart an information furnace that allows users to enjoy a variety of multimedia content distributed from a host computer to various appliances throughout the home. Within this vision of the information furnace, the home computer is established as the central aggregation point for digital content in the home, which content is then wirelessly distributed to locations and appliances throughout the home that are optimal for its consumption. These alternative consumption locations enable new dynamics in the use of multimedia content, including mobility, comfort, social interaction, and linkages with other household appliances, such as audio/visual systems. The information furnace further allows users to consume the content in their preferred locations (and even be mobile in the home if desired), enables multiple users to simultaneously interact with the content, and reduces the cost of the appliances used to access the content (computing resources, such as the CPU, memory and modem are leveraged from a central source).
The distribution of video information as part of the home network environment presents certain challenges for the network designer. For example, with the increasing popularity of multimedia applications there is increasing use of digitally encoded visual data. Thus, digitally encoded video images will need to be transmitted across wired and wireless communication channels of the network, for applications such as video-conferencing, interactive computing, entertainment programming, etc. These digital images are, by nature of their graphical content, relatively more complex than, say, digital audio and, thus, require significant bandwidth within the communication channels to transport the complex information embodying the images. Further, multimedia applications often include xe2x80x9csyntheticxe2x80x9d, or computer-generated, images (e.g., the image of a spread-sheet or a page generated by a word processing application) that have little or no relative motion from frame to frame, but are nevertheless high contrast images. Such images often include a very high amount of energy in their high-frequency spatial range, as compared to so-called xe2x80x9cnaturalxe2x80x9d images (e.g., a picture of a person). Transmitting these images within a communication channel also requires the use of significant bandwidth. Accordingly, to transport such information efficiently, and in real time, digital imaging applications hinge on the use of data compression techniques to reduce the amount of information to be transmitted within the network to manageable levels.
In light of the above, it is not surprising that image data compression often involves reducing the amount of data required to represent a digital image. One common basis of the reduction process is the removal of redundant data. In addition, inherent non-linearities in human visual perception can be leveraged to reduce the amount of data to be displayed in succeeding frames of a motion video. Accordingly, existing compression schemes exploit correlation in both space and time for video signals. Spatial compression is known as intra-frame compression, while temporal compression is known as inter-frame compression.
Generally, methods that achieve high compression ratios (e.g., over 50:1) are lossy, in that the data that is reconstructed from a compressed image is not identical to the original. The xe2x80x9clossesxe2x80x9d experienced in the compression process are manifested as distortions in the reconstructed images. While lossless compression methods do exist, their compression ratios are far lower. For most commercial, industrial and consumer applications, lossy methods are preferred because they save on required storage space and communication channel bandwidth.
Lossy compression methods tend to be acceptable because they generally exploit nonlinear aspects of human visual system. For instance, the human eye is much more receptive to fine detail in the luminance (or brightness) of an image, than in the chrominance (or color) thereof. Also, the eye is less sensitive to distortions in the high-frequency range of an image""s spatial spectrum, especially in the presence of motion. As a result, in viewing a sequence of images reconstructed from a lossy compression scheme, the human eye is more forgiving to the presence of high frequency compression coding artifacts (e.g., distortion of edges) in a moving video than in a static image. That is, motion images may mask compression coding artifacts that would be otherwise visible in still images.
Various techniques have been adopted as industry standards for motion image compression, including Recommendation H.261 of the Consultative Committee on International Telephony and Telegraphy (CCITT) for video conferencing, and schemes proposed by the Moving Pictures Expert Group (MPEG) for full-motion compression for digital storage medium. While such video compression methods can compress data at high ratios with acceptable quality in the decompressed images, they do not necessarily provide high data compression ratios for use in limited bandwidth environments such as home networks.
Further, these prior compression processes do not include means for correcting distortions that may be present in earlier-transmitted frames. For example, in those prior video compression schemes that attempt to improve compression efficiency by reducing inter-frame redundancy with the use of xe2x80x9cmotion estimationxe2x80x9d and/or xe2x80x9cmotion predictionxe2x80x9d, earlier-transmitted frames are updated by compressing and transmitting the difference between a current frame and a preceding frame. In this manner, the compression process is made more efficient, as subsequent frames do not need to be compressed in their entirety if the extent of the changes between frames is limited. For example, in a video recording of a swinging pendulum in front of a static but feature-rich background, the inter-frame changes may be only those sections of the frames of the video corresponding to the swinging movements of the pendulum. Only these changes need to be compressed and transmitted, without the need to transmit the same feature rich background in all the frames. Then to reconstruct the current frame, the preceding-transmitted frame is updated with the transmitted changes.
Although these schemes tend to conserve bandwidth, it is likely that distortions will be present in the earlier-transmitted frames. Thus, such distortions are necessarily carried through to subsequent frames. Moreover, with each new frame, additional compression distortions will be introduced into the reconstructed images. Consequently, the compression distortions tend to accumulate from frame to frame, yet these prior compression schemes do not provide means to reduce or eliminate these distortions.
In one embodiment, a method for enhancing the quality of digital images recovered from compressed data in an inter-frame redundancy-removing scheme is provided. Briefly, a self-adaptive feedback scheme is deployed in an image compression/decompression system so as to include means for the compensation of the distortion component from prior frame compression in subsequent difference frame compression. This may be implemented by storing each transmitted frame after a full compress/decompress cycle, and transmitting the difference data (which includes the inverse, or negative, of the distortion component from compression of the transmitted frame) representing the difference between the stored frame and the incoming new frame. Consequently, the quality of static regions in the recovered images may be improved with each subsequent iteration by taking the distortion component in the prior frame into consideration along with the inter-frame motion information. The feedback loop thus forms a self-adaptive iterative cycle.
In a further embodiment, wavelet analysis is deployed to enhance the efficiency of a data compression scheme that employs adaptive feedback. Each incoming image frame represented in the spatial domain is transformed into the wavelet domain before being compressed and transmitted (e.g., to a remote receiver). The compressed data may be fed back and stored in an accumulation buffer without wavelet synthesis. Then, difference data to be transmitted in the next frame is obtained by comparing the incoming frame and the stored frame, which are both in wavelet represention.
In another embodiment, data quantization may be carried out using a number of bit allocation planes for each of a number of Mallat blocks to be transmitted and/or retained for storage, so as to meet transmission bandwidth limitations and/or data storage space limitations in a data compression scheme. The number of bit planes allocated to a given Mallat block is determined in accordance with required accuracy or resolution for a frequency range represented by that block. Low frequency blocks may be given a higher priority over high frequency blocks and, accordingly, may have more bit planes allocated than are allocated for the high frequency blocks.
In still another embodiment, a post augmentation scheme may be deployed at the decompression side of a compression/decompression system to enhance the accuracy of data that was subject to quantization (e.g., arising from Mallat blocks that were assigned an incomplete number of bit planes). According to this scheme, the actual value of the recovered data is taken to be between the indicated value with non-transmitted bits being one and its indicated value with all non-transmitted bits being zero. For example, the reconstructed value may be taken to be the average or median of the above two values, which statistically minimizes the quantization error, being in the middle of xe2x80x9cuncertaintyxe2x80x9d interval.
Still further embodiments are discussed in the following description and its accompanying drawings.