Digital image processing is an important area of advancement in the field of computer science with many current applications and an increasingly growing number of potential applications. The subject of digital image processing includes the storage, analysis and communication of images which are represented in the digital domain by a series of bits or bytes corresponding to each point in an image. A typical example of a digital image is one that appears on a screen of a computer. The screen consists of a number of monochrome or colored picture elements ("pixels"), each of which have associated binary values which determine if the pixel should be illuminated (and in some cases how bright it should be illuminated). The simplest case is where each pixel has one bit of data associated with it on a black and white screen. If the pixel is lit, then the value of the bit is set to one. If the pixel is not lit, then the binary value is set to zero. Each pixel could instead have a byte (8 bits) of data representing either the distinct color, particular shade of grey or some other information. A typical screen could have an array of 520 by 480 pixels to display an image. In order to store one complete screen containing an image where each pixel has a corresponding byte of data to it, approximately two megabits of data would have to be used for this example (520.times.480). More pixels are used in higher resolution screens which are becoming more and more popular today.
In order to store a large number of single images in a database for storage and processing, a data compression technique is required to make managing the database efficient and feasible for operating in real time. In addition to on-site applications with digital images, digital images can be transmitted to an outside site either via a network, dedicated line or some other type of conduit of data. In order to increase the efficiency of data transmission and represent images which will fit in the bandwidth of the data conduit, the data must be also compressed. An imaging device for recording images such as a digital camera could be placed at a remote location, have the image data digitally processed and compressed at the remote location, transmit the compressed data to a central processing station or other final destination location, and decode the image information so that an operator at the final location can view the image. The decoded image could also be matched against a database of stored images for identification purposes. If the database contained many records of images to be matched, the images stored in the database would need to be compressed in order for the database to hold and process the required number of images for a particular application. Accelerated pattern matching may be required for potential applications such as identifying a criminal caught on a bank's videotape where batch processing for storage and transmission purposes of the matching operation could take up to several hours due to the vast size of the database.
While the compression of image information is necessary for pattern matching, some conventional compression techniques can lose important image information in the process of compressing the data. An important aspect of a pattern matching technique is to be able to preserve the essential features of an object, such as their edges. The physical differences in the objects of the images could be very slight and there may be many similar objects stored in a database to be distinguished and matched. An example is a database of people who work for a large company or live in a small town. The pattern matching technique could be used to identify persons at an entrance gate but would have to account for small difference in facial features in order to distinguish the people. The use of digital images of faces in a database is currently being used for storage. In New York State and other states, the pictures on driver's licenses are digital images which are stored and be reproduced if a license is lost. The next step is to match images of people captured on cameras at crime scenes to the driver's license database of physical images to identify the criminal. Digital images of fingerprints or other objects could also be used. Pattern recognition of images should not be limited to objects in the exact same position because objects are not always still, but the recognition technique should allow objects to be rotated and placed in any position when pattern matching.
Digital image processing also includes video processing. Video is basically a time series of single images (called frames). Each image frame when shown sequentially over time shows movement in the objects present in an image. Video image data can also be stored and replayed. One example of digital video images is the video clips that appear in popular software programs. These video clips can include clips from movies which have been digitally recorded or clips recorded by a camera and stored digitally in the computer. Video images can also be transmitted over long distances. One example is a teleconferencing which shows the image of the speaker while talking at a remote location and shows the speaker's movement or expression.
Video images require a large amount of data to represent just a few seconds of video time. Each individual frame of the video must be stored and replayed to create a recognizable video image. Even if only a portion of the frames are stored, the sheer number of frames requires the image data be compressed. Video images can also be used in pattern matching schemes which could identify particular objects in the video images. This may allow an air traffic controller to identify planes if other communication systems fail.
From the above discussion, a digital image encoding scheme is desired which has a high compression ratio while still preserving the feature's important details such as its edges.
One compression scheme currently in use is called "fractal encoding". Fractal encoding takes advantage of the fact that many subparts of an image are repeated and therefore an image can be represented by a mapping of the portions of the image to only a fraction of the subparts of the image (called blocks). By mapping the image onto pieces of itself, a separate code book and word relating parts of an image to other objects does not need to be stored. Fractal encoding subdivides an image to be encoded into blocks which taken as a whole make up the entire image. Some of the blocks may overlap and be different sizes. In conventional fractal encoding, the image is divided into two sets of blocks. The first set is the domain blocks which will be compared with second set of blocks called range blocks. The domain blocks can be rotated and have mirror images created in order to create more choices of domain blocks which can be compared against the range blocks. Each domain block is compared to each range block to determine the closest match. The mapping of the domain blocks to the range blocks is stored. Only information regarding matching blocks is used and the remaining blocks may be discarded thus compressing the data.
Fractal encoding does generate high compression ratios relative to other known compression schemes. A compression ratio is defined as the number of bits in the original image to the number of bits in the compressed image. However, images which have been fractally encoded tend to produce blocky artifacts when reconstructed and decompressed. This is due to the data being organized in blocks. The fine edge information which is required by advanced pattern recognition systems is not satisfied by only using a block matching fractal encoding scheme.
Another technique for compressing digital image information is wavelet edge detection. Wavelet compression techniques exploit the fact that images have spatial and spectral redundancies which can be eliminated to reduce the size of the data structure used to represent the image. In simple terms, wavelets transform an image signal into a set of basis functions, much like the application of a Fourier transform which uses sines and cosines as a basis set. When the set of basis functions is applied, the original image is transformed into a set of coefficients. These coefficients can be further transformed when a derivative or gradient operator is applied to the basis set. The coefficients then take the form of edges in different frequency bands or scales which allows for an efficient means of image and video compression.
Wavelet transformations produce frequency scales which decrease in resolution as the scales increase. The wavelet transform, when applied with a gradient operator, can remove texture from the image resulting in decreased reproduction quality. It would be beneficial to combine the compression qualities of fractal encoding with the shape preserving qualities of the wavelet encoding techniques.
Some techniques have been recently developed using aspects from both fractal and wavelet techniques. These techniques focus on taking fractal compression techniques which are traditionally applied in a spatial domain, and applying them in the wavelet domain instead. However, these techniques do not take full advantage of spatial similarities revealed by the gradient operator in the fractal portion of the technique, and thus lose image quality as the compression ratio for the technique increases.