The present invention relates to image compression systems, and in particular relates to an image compression system which provides hypercompression.
Image compression reduces the amount of data necessary to represent a digital image by eliminating spatial and/or temporal redundancies in the image information. Compression is necessary in order to efficiently store and transmit still and video image information. Without compression, most applications in which image information is stored and/or transmitted would be rendered impractical or impossible.
Generally speaking, there are two types of compression: lossless and lossy. Lossless compression reduces the amount of image data stored and transmitted without any information loss, i.e., without any loss in the quality of the image. Lossy compression reduces the amount of image data stored and transmitted with at least some information loss, i.e., with at least some loss of quality of the image.
Lossy compression is performed with a view to meeting a given available storage and/or transmission capacity. In other words, external constraints for a given system may define a limited storage space available for storing the image information, or a limited bandwidth (data rate) available for transmitting the image information. Lossy compression sacrifices image quality in order to fit the image information within the constraints of the given available storage or transmission capacity. It follows that, in any given system, lossy compression would be unnecessary if sufficiently high compression ratios could be achieved, because a sufficiently high compression ratio would enable the image information to fit within the constraints of the given available storage or transmission capacity without information loss.
The vast majority of compression standards in existence today relate to lossy compression. These techniques typically use cosine-type transforms like DCT and wavelet compression, which are specific types of transforms, and have a tendency to lose high frequency information due to limited bandwidth. The xe2x80x9cedgesxe2x80x9d of images typically contain very high frequency components because they have drastic gray level changes, i.e., their dynamic range is very large. Edges also have high resolution. Loss of edge information is undesirable because resolution is lost as well as high frequency information. Furthermore, human cognition of an image is primarily dependent upon edges or contours. If this information is eliminated in the compression process, human ability to recognize the image decreases.
Fractal compression, though better than most, suffers from high transmission bandwidth requirements and slow coding algorithms. Another type of motion (video) image compression technique is the ITU-recommended H.261 standard for videophone/videoconferencing applications. It operates at integer multiples of 64 kbps and its segmentation and model based methodology splits an image into several regions of specific shapes, and then the contour and texture parameters representing the region boundaries and approximating the region pixels, respectively, are encoded. A basic difficulty with the segmentation and model-based approach is low image quality connected with the estimation of parameters in 3-D space in order to impart naturalness to the 3-D image. The shortcomings of this technique are obvious to those who have used videophone/videoconferencing applications with respect particularly to MPEG video compression.
Standard MPEG video compression is accomplished by sending an xe2x80x9cI framexe2x80x9d representing motion every fifteen frames regardless of video content. The introduction of I frames asynchronously into the video bitstream in the encoder is wasteful and introduces artifacts because there is no correlation between the I frames and the B and P frames of the video. This procedure results in wasted bandwidth. Particularly, if an I frame has been inserted into B and P frames containing no motion, bandwidth is wasted because the I frame was essentially unnecessary yet, unfortunately, uses up significant bandwidth because of its full content. On the other hand, if no I frame is inserted where there is a lot of motion in the video bitstream, such overwhelming and significant errors and artifacts are created that bandwidth is exceeded. Since the bandwidth is exceeded by the creation of these errors, they will drop off and thereby create the much unwanted blocking effect in the video image. In the desired case, if an I frame is inserted where there is motion (which is where an I frame is desired and necessary) the B and P frames will already be correlated to the new motion sequence and the video image will be satisfactory. This, however, happens only a portion of the time in standard compression techniques like MPEG. Accordingly, it would be extremely beneficial to insert I frames only where warranted by video content.
The compression rates required in many applications including tactical communications are extremely high as shown in the following example making maximal compression of critical importance. Assuming 5122 number of pixels, 8-bit gray level, and 30 Hz full-motion video rate, a bandwidth of 60 Mbps is required. To compress data into the required data rate of 128 kbps from such a full video uncompressed bandwidth of 60 Mbps, a 468:1 still image compression rate is required. The situation is even more extreme for VGA full-motion video which requires 221. Mbps and thus a 1726:1 motion video compression rate. Such compression rates, of course, greatly exceed any compression rate achievable by state of the art technology for reasonable PSNR (peak signal to noise ratio) values of approximately 30 dB. For example, the fourth public release of JPEG has only a 30:1 compression rate and the image has many artifacts due to a PSNR of less than 20 dB, while H320 has a 300:1 compression ratio for motion and still contains many still/motion image artifacts.
The situation is even more stringent for continuity of communication when degradation of power budget or multi-path errors of wireless media further reduce the allowable data rate to far below 128 kbps. Consequently, state of the art technology is far from providing multi-media parallel channelization and continuity data rates at equal to or lower than 128 kbps.
Very high compression rates, high image quality, and low transmission bandwidth are critical to modern communications, including satellite communications, which require full-motion, high resolution, and the ability to preserve high-quality fidelity of digital image transmission within a small bandwidth communication channel (e.g. T1). Unfortunately, due to the above limitations, state of the art compression techniques are not able to transmit high quality video in real-time on a band-limited communication channel. As a result, it is evident that a compression technique for both still and moving pictures that has a very high compression rate, high image quality, and low transmission bandwidth and a very fast decompression algorithm would be of great benefit. Particularly, a compression technique having the above characteristics and which preserves high frequency components as well as edge resolution would be particularly useful.
In addition to transmission or storage of compressed still or moving images, another area where the state of the art is unsatisfactory is in automatic target recognition (ATR). There are numerous applications, both civilian and military, which require the fast recognition of objects or humans amid significant background noise. Two types of ATR are used for this purpose, soft ATR and hard ATR. Soft ATR is used to recognize general categories of objects such as tanks or planes or humans whereas hard ATR is used to recognize specific types or models of objects within a particular category. Existing methods of both soft and hard ATR are Fourier transform-based. These methods are lacking in that Fourier analysis eliminates desired xe2x80x9csoft edgexe2x80x9d or contour information which is critical to human cognition. Improved methods are therefore needed to achieve more accurate recognition of general categories of objects by preserving critical xe2x80x9csoft edgexe2x80x9d information yet reducing the amount of data used to represent such objects and thereby greatly decrease processing time, increase compression rates, and preserve image quality.
The present invention is based on Isomorphic Singular Manifold Projection (ISMP) or Catastrophe Manifold Projection (CMP). This method is based on Newtonian polynomial space and characterizes the images to be compressed with singular manifold representations called catastrophes. The singular manifold representations can be represented by polynomials which can be transformed into a few discrete numbers called xe2x80x9cdateryxe2x80x9d (number data that represent the image) that significantly reduce information content. This leads to extremely high compression rates (CR) for both still and moving images while preserving critical information about the objects in the image.
In this method, isomorphic mapping is utilized to map between the physical boundary of a 3-D surface and its 2-D plane. A projection can be represented as a normal photometric projection by adding the physical parameters, B (luminance) to generic geometric parameters (X, Y). This projection has a unique 3-D interpretation in the form of a xe2x80x9ccanonical singular manifoldxe2x80x9d. This manifold can be described by a simple polynomial and therefore compressed into a few discrete numbers resulting in hyper compression. In essence, any image is a highly correlated sequence of data. The present invention xe2x80x9ckillsxe2x80x9d this correlation, and image information in the form of a digital continuum of pixels almost disappears. All differences in 2-D xe2x80x9ctexturexe2x80x9d connected with the 2-D projection of a 3-D object are xe2x80x9cabsorbedxe2x80x9d by a contour topology, thus preserving and emphasizing the xe2x80x9csculpturexe2x80x9d of the objects in the image. This allows expansion with good fidelity of a 2-D projection of a real 3-D object into an abstract (mathematical) 3-D object and is advantageous for both still and video compression and automatic target recognition.
More particularly, using catastrophe theory, surfaces of objects may be represented in the form of simple polynomials that have single-valued (isomorphic) inverse reconstructions. According to the invention, these polynomials are chosen to represent the surfaces and are then reduced to compact tabulated normal form polynomials which comprise simple numbers, i.e., the datery, which can be represented with very few bits. This enables exceptionally high compression rates because the xe2x80x9csculpturexe2x80x9d characteristics of the object are isomorphically represented in the form of simple polynomials having single-valued inverse reconstructions. Preservation of the xe2x80x9csculpturexe2x80x9d and the soft edges or contours of the object is critical to human cognition of the image for both still and video image viewing and ATR. Thus, the compression technique of the present invention provides exact representation of 3-D projection edges and exact representation of all the peculiarities of moving (rotating, etc.) 3-D objects, based on a simple transition between still picture representation to moving pictures.
In a preferred embodiment the following steps may be followed to compress a still image using isomorphic singular manifold projections and highly compressed datery. The first step is to subdivide the original image, IO, into blocks of pixels, for example 16xc3x9716 or other sizes. These subdivisions of the image may be fixed in size or variable. The second step is to create a xe2x80x9ccanonical imagexe2x80x9d of each block by finding a match between one of fourteen canonical polynomials and the intensity distribution for each block or segment of pixels. The correct polynomial is chosen for each block by using standard merit functions. The third step is to create a model image, IM, xe2x80x9csculpturexe2x80x9d of the entire image by finding connections between neighboring local blocks or segments of the second step to smooth out intensity (and physical structure to some degree). The fourth step is to recapture and work on the delocalized high frequency content of the image, i.e., the xe2x80x9ctexturexe2x80x9d. This is done by a subtraction of the model image, IM generated during the third step from the original segmented image, IO, created during the first step. A preferred embodiment of this entire still image compression process will be discussed in detail below.
Optimal compression of video and other media containing motion may be achieved in accordance with the present invention by inserting I frames based on video content as opposed to at fixed intervals (typically every 15 frames) as in the prior art motion estimation methods. In accordance with the motion estimation techniques of the present invention, the errors between standard xe2x80x9cmicroblocksxe2x80x9d or segments of the current frame and a predicted frame are not only sent to the decoder to reconstruct the current frame, but, in addition, are accumulated and used to determine the optimal insertion points for I frames based on video content. Where the accumulated error of all the microbloceks for the current frame exceeds a predetermined threshold which itself is chosen based upon the type of video (action, documentary, nature, etc.), this indicates that the next subsequent frame after the particular frame having high accumulated error should be an I frame. Consequently, in accordance with the present invention, where the accumulated errors between the microblocks or segments of the current frame and the predicted frame exceed the threshold, the next subsequent frame is sent as an I frame which starts a new motion estimation sequence. Consequently, I frame insertion is content dependent which greatly improves the quality of the compressed video.
The I frames inserted in the above compression technique may first be compressed using standard DCT based compression algorithms or the isomorphic singular manifold projection (ISMP) still image compression technique of the present invention for maximal compression. In either case, the compression techniques used are preferably MPEG compatible.
Additionally, using the motion estimation technique compression of the present invention, compression ratios can be dynamically updated from frame to frame utilizing the accumulated error information. The compression ratio may be changed based on feedback from the receiver and, for instance, where the accumulated errors in motion estimation are high, the compression ratio may be decreased, thereby increasing bandwidth of the signal to be stored. If, on the other hand, the error is low, the compression ratio can be increased, thereby decreasing bandwidth of the signal to be stored.
Because the present invention is a 3-D non-linear technique that produces high level descriptive image representation using polynomial terms that can be represented by a few discrete numbers or datery, it provides much higher image compression than MPEG (greater than 1000:1 versus 100:1 in MPEG), higher frame rate (up to 60 frames/sec versus 30 frames/sec in MPEG), and higher picture quality or peak signal to noise ratio (PSNR greater than 32 dB versus PSNR greater than 23 dB in MPEG). Consequently, the compression technique of the present invention can provide more video channels than MPEG for any given channel bandwidth, video frame rate, and picture quality.