1. Technical Field
The invention relates generally to a method of representing and manipulating images and, more specifically to processing digital images in a manner that is both biologically motivated and conductive to computer vision system performing artificial intelligence tasks such as recognition and identification of patterns and man-made objects and understanding of dynamic scenes.
2. Description of the Related Art
Appendix A, included below, is a list of references referred to throughout this Specification by means of a number within square brackets, e.g. “[1]”.
Computers have dramatically changed the way society processes information. An important aspect of computer information processing is the task of object recognition, a subset of computer vision. Although great strides have been made in computer object recognition, in an active vision situation, the human visual system with its efficiency in the amount of information needed to be processed to isolate an object from the background and recognize the object in a perspectively-independent way, is far more sophisticated than any contemporary computer vision system. Computer visions systems are also referred to as “active vision systems” and consist of such components as moving camera heads, a hardware image processor, and an image-analyzing computer. In fact, if the human visual system were to store, process and analyze pictorial information in the same way as most computer systems, the size of the brain would have to be at least 5,000 pounds. Moreover, a computer system, which performs pattern recognition tasks, has trouble recognizing an image that has undergone a perspective transformation. For example, if a computer vision system has stored a particular watermark for document verification purposes, a document presented for verification typically must be situated in one of a few specific orientations and can not be viewed from different vantage points.
These aspects of computer recognition relate in particular to robotic vision problems. Issues of perspective and conformal transformations arise in this context of active vision systems. Perspective image transformations arise, for example, when a mobile robotic system, for example, enters a room through one door, records and stores a particular painting on the wall and, then exits the room and reenters through a different door. In this situation, the robot has trouble recognizing the painting and, thus, orienting itself with respect to the painting.
Conformal image transformations arise when modeling a biological visual system, which is a highly desirable model for active vision systems. Experimental evidence points to a one-to-one retinotopic mapping from the retina of the eye (consisting of light-sensitive cells with the highest density around the fovea and decreasing density concentrically away for the fovea, based upon viewing angle) to the visual cortex on the back of the brain (consisting of a constant density of cells). This transformation can be modeled by a complex logarithm, a conformal transformation designed by evolution to account for the foveal magnification and nice behavior under scaling and rotation transformations. Both properties result in a few orders of magnitude savings in the amount of the pictorial information processed by the brain's visual pathway. In log-polar coordinates such transformations are represented by translations.
There exist several hardware and software systems for obtaining log-polar images. One approach is to use software to transform a typical Cartesian image from a standard camera, using transformation equations between the retinal plane and the Cartesian plane. This approach is very computationally time-consuming if the images must be processed in order to perform any other task. A second approach is a pure-hardware approach, i.e. the log-polar transformation is made directly from a sensor with a log-polar pixel distribution. However, this approach necessarily employs fixed parameters and is therefore inflexible. A third approach employs a circuit for performing log-polar image transformations in conjunction with a programmable device. This approach provides more speed and flexibility. The disclosed Projective Fourier transforms are important for the first and third approach, but less important for the second. The disclosed Projective convolution also is important for the first and third approaches.
The standard Fourier Transform (FT) used to represent digital images, which is efficiently computable by the Fast Fourier Transform (FFT) algorithm, is not well adapted to both the perspective and conformal image transformations. For example one can render translated and rotated copied of an image using one Fourier image representation, but, when perspective or conformal image transformations are applied, this is no longer feasible.
Other, more recently developed image representations based upon theories of computational harmonic analysis involving wavelet, Gabor and Fourier-Mellin transforms suffer from the same problem as Fourier analysis of not being well adapted to perspective and conformal transformations. A lack of perspective and conformal characteristics follow form the fact that these transforms are based upon Euclidean, affine Heisenberg and similarity group transformations rather than a group of full-blown projective transformations. Although substantial work has been done to develop image representations well adapted to many important Lie groups and projected motion groups, e.g. [1,7,8], no attempts have been undertaken to develop systematically the corresponding image processing algorithms from the group-theoretic framework.