1. Field of the Invention
The present invention relates generally to image processing systems and, more particularly, to systems and methods for processing image data based upon predetermined regions of human visual interest.
2. Background of the Invention
The Scanpath Theory of human vision, proposed by Noton and Stark in 1971, suggests that a top-down, internal cognitive model of what a person sees when actively looking at an image guides active eye movements of the person and controls and/or influences the person""s perception of the image being viewed. Stated somewhat differently, Noton and Stark suggest that eye movements utilized in visually examining an image are generated based at least in part upon an internal cognitive model that has been developed by a person through experience. The term xe2x80x9ctop down processingxe2x80x9d as used herein denotes image processing that proceeds with some assumed knowledge regarding the type of image being viewed or image data being analyzed. Thus, the Scanpath Theory posits that when a person views an image, the eye movements of the person will follow a pattern that is premised upon knowledge of the type of image that is being viewed and/or similar types of images.
The Scanpath Theory recognizes that active eye movements comprise an essential part of visual perception, because these eye movements carry the fovea, a region of high visual acuity in the retina, into each part of an image to be processed. Thus, the Scanpath Theory posits that an internal cognitive model drives human eye movements in a repetitive, sequential set of saccades and fixations (xe2x80x9cglancesxe2x80x9d) over specific regions-of-interest (xe2x80x9cROIsxe2x80x9d) in a scene, with the subconscious aim of confirming the top-down, internal cognitive modelxe2x80x94the xe2x80x9cMind""s Eyexe2x80x9d, so to speak.
Experimental investigation of the Scanpath Theory has involved presenting a complex visual stimulus (such as a scenic photograph) to a human subject and recording the eye movements made by the subject while looking at the presented image. Thus, computer-controlled experiments present an image and carefully measure the subject""s eye movements using video cameras. Eye movement recordings are then represented as sequences of alternating glances (saccades and fixations), where the duration of each glance generally lasts about 300 milliseconds. Every glance the subject makes while looking at the image enables the high resolution fovea of the retina to abstract information from the image during the fixation period, identifying a fixation point on the image as a visual region-of-interest, or ROI. This is shown, for example, in FIGS. 8a, 8b and 9.
Diametrically opposed to the Scanpath Theory, current methods for computerized image processing are usually intended to detect and localize specific features in a digital image in a xe2x80x9cbottom-upxe2x80x9d fashion, analyzing, for example, spatial frequency, texture conformation, or other informative values of loci of the visual stimulus. The term xe2x80x9cbottom up processingxe2x80x9d is used herein to denote processing methods that assume no knowledge of an image being viewed or image data being processed. Prior art methods that have been proposed in the literature can be classified into three principal approaches:
1. Structural Methods are based on an assumption that images have detectable and recognizable primitives, which are distributed according to some placement rulesxe2x80x94examples of prior art methods that use such an approach are matched filters.
2. Statistical Methods are based on statistical characteristics of the texture of the picturexe2x80x94examples of prior art methods that use a statistical approach are Co-Occurrence Matrices and Entropy Functions.
3. Modeling Methods hypothesize underlying processes for generating local regions of visual interestxe2x80x94examples of prior art that use a modeling approach are Fractal Descriptors.
U.S. Pat. No. 5,535,013, entitled xe2x80x9cImage Data Compression and Expansion Apparatus, and Image Area Discrimination Processing Apparatus Therefor,xe2x80x9d teaches a method of image data compression in which an image is first divided into square pixel blocks and then encoded using an orthogonal transform. This is a statistical method. The encoding process is based upon a discrete cosine transform, and is thus a JPEG algorithm. Using the coefficients of the discrete cosine transform, the method taught by U.S. Pat. No. 5,535,013 discriminates blocks containing text from blocks containing general, non-text dot images. Then; a selective quantization method is used to identify different quantization coefficients for text blocks and non-text blocks.
Other bottom-up methods of image processing suggest that characterization and decomposition of an image can be based upon primitives such as color, texture, or shape. Such methods can be more powerful than the text/non-text discrimination method of U.S. Pat. No. 5,535,013, but still cannot overcome the important limitation that for a general, complex image, regions of interest are difficult to specify by a single parameter such as color or shape. This is shown, for example, in U.S. Pat. No. 5,579,471, entitled xe2x80x9cImage Query System and Method.xe2x80x9d
In view of the foregoing, it is submitted that those skilled in the art would find to be quite useful a method and apparatus for image processing which takes into account the underlying nature of human vision and perception, so as to selectively decompose an image into its most meaningful regions of visual interest, thereby providing a means for improving image compression, image query techniques and visual image enhancement systems.
In one particularly innovative aspect, the present invention is directed to systems and methods for image processing that utilize a cognitive model stored in memory to identify regions within an image that correlate with previously determined regions of visual interest for a given type of image or type of image data being processed.
In another innovative aspect, systems and methods in accordance with the present invention may select algorithms for processing collections of images by comparing algorithmic region of interest (aROI) data to stored human visual region of interest (hROI) data to select an optimal algorithm or group of algorithms to be used in transforming data comprising the collection or collections of images. The selected algorithms may then be used, for example, in data compression, image enhancement or database query functions.
In still another innovative aspect, the present invention is directed to systems and methods that utilize conventional image processing algorithms in combination with innovative clustering, sequencing, comparing and parsing techniques to predict loci of human fixations within an image or within collections of images for the purposes of, for example, data compression, image enhancement and image database query functions. Indeed, empirical analysis reveals that systems and method in accordance with the present invention enable a prediction of human fixation loci that is comparable in measure to the ability of one human to predict the loci of eye movements of other persons viewing an image.
In still another innovative aspect, systems and methods in accordance with the present invention may detect regions of visual interest (ROIs) within an image based upon stored characteristic data representative of human visual perception. For example, using the method(s) of the present invention, algorithmic regions of interest (aROIs) having a high, or relatively high, correlation with human regions of visual interests (hROIs) may be developed for an image or collection of images, and thereafter an image or collection of images may be saved within a system using selected portions of the original picture (i.e., aROIs) as identification data. Then, the selected portions of the picture (i.e., saved aROI data) may be used in performing a query search. The query search may proceed, for example, by comparing saved aROIs in a database with ROIs specified by the system operator. Processing image data in this fashion should provide for substantial reductions in image processing time. Further, it will be appreciated that, through the use of processing algorithms and methodologies in accordance with the present invention, it is possible to take into consideration more complex features of an image, not just indications of color, shape and the like.
A system for compressing and processing collections of images in accordance with one form of the present invention may comprise, for example, means for transforming image data representative of a particular image, collection of images or type of image into a domain of xe2x80x9cvisual relevancexe2x80x9d, for example, using a database of image processing transformation functions; means for obtaining a set of algorithmic regions of interest (aROIs) from a transformed image, for example, by thresholding; means for clustering local maxima from the transformed image into a second set of only a few, very relevant algorithmic regions of interest (aROIs), such that the most relevant algorithmic regions of interest (aROIs) are properly distributed over the image; means for comparing the identified algorithmic regions of interest (aROIs) with predetermined human visual regions of interest (hROIs) to select an optimal image processing transformation function; and means for using the selected optimal image processing transformation function to compress the remainder of images with a collection or collections of images. In addition, a system in accordance with the present invention may comprise means for using the algorithmic regions of interest (aROIs) to implement image query functions and/or means for using the algorithmic region of interest (aROI) data to implement various visual image enhancement techniques.
It will be appreciated that systems and methods in accordance with the present invention can be utilized to process very large collections of data including, for example, large collections of pictures, scenes and works of art. It also will be appreciated that systems and methods in accordance with the present invention may be utilized, for example, to compress, search and/or enhance images ranging from natural and constructed landscapes and xe2x80x9ccityscapesxe2x80x9d, to groups of persons and animals and objects, and to single portraits and still lives.
Accordingly, it is an object of the present invention to provide improved systems and methods for use in the field of image processing.
It is also an object of the present invention to provide systems and methods the utilize top down image processing techniques to improve image processing functions and efficiency.
Other objects and features of the present invention will become apparent from consideration of the following description taken in conjunction with the accompanying drawings.