This invention relates to a real-time image processing system for use in computer vision systems and artificial intelligence systems. A great deal of investigatory work and research has been expended in attempting to understand the biological visual system. Apart from the intrinsic value of the knowledge gained by such an understanding, it is hoped that the knowledge gained can be applied to produce man-made machines to simulate the biological visual system by a combination of opto-electronic devices and computer data processing techniques to thereby achieve the efficiency inherent in biological visual systems.
The research resulting in the development of the present invention is specifically directed to an understanding of how perceptual grouping occurs in the biological visual system context.
The visual system segments optical input into regions that are separated by perceived contours or boundaries. This rapid, seemingly automatic, early step in visual processing is difficult to characterize, largely because many perceived contours have no obvious correlates in the optical input. A contour in a pattern of luminances is generally defined as a spatial discontinuity in luminance. Although usually sufficient, however, such discontinuities are by no means necessary for sustaining perceived contours. Regions separated by visual contours also occur in the presence of: statistical differences in textural qualities (such as orientation, shape, density, or color), binocular matching of elements of differing disparities, accretion and deletion of texture elements in moving displays, and classical "subjective contours". The extent to which the types of perceived contours just named involve the same visual processes as those triggered by luminance contours is not obvious, although the former are certainly as perceptually real and generally as vivid as the latter.
The visual system's segmentation of the scenic input occurs rapidly throughout all regions of that input, in a manner often described as "preattentive." That is, subjects generally describe boundaries in a consistent manner when exposure times are short (under 200 msec) and without prior knowledge of the regions in a display at which boundaries are likely to occur. Thus, any theoretical account of boundary extraction for such displays must explain how early "data driven" processes rapidly converge on boundaries wherever they occur.
The second finding of the experimental work on textures complicates the implications of the first, however: the textural segmentation process is exquisitely context-sensitive. That is, a given texture element at a given location can be part of a variety of larger groupings, depending on what surrounds it. Indeed, the precise determination even of what acts as an element at a given location can depend on patterns at nearby locations.
One of the greatest sources of difficulty in understanding visual perception and in designing fast object recognition systems is such context sensitivity of perceptual units. Since the work of Gestaltists, it has been widely recognized that local features of a scene, such as edge positions, disparities, lengths, orientations, and contrasts, are perceptually ambiguous, but that combinations of these features can be quickly grouped by a perceiver to generate a clear separation between figures or between figure and ground. Indeed, a figure within a textured scene often seems to "pop out" from the ground. The "emergent" features by which an observer perceptually groups the "local" features within a scene are sensitive to the global structuring of textural elements within the scene.
The fact that these emergent perceptual units, rather than local features, are used to group a scene carriers with it the possibility of scientific chaos. If every scene can define its own context-sensitive units, then perhaps object perception can only be described in terms of an unwieldy taxonomy of scenes and their unique perceptual units. One of the great accomplishments of the Gestaltists was to suggest a short list of rules for perceptual grouping that helped to organize many interesting examples. As is often the case in pioneering work, the rules were neither always obeyed nor exhaustive. No justification for the rules was given other than their evident plausibility. More seriously for practical applications, no effective computational algorithms were given to instantiate the rules.
The collective effect of these contributions has been to provide a sophisticated experimental literature about textural grouping which has identified the main properties that need to be considered. What has not been achieved is a deep analysis of the design principles and mechanisms that lie behind the properties of perceptual grouping. Expressed in another way, what is missing is the raison d'etre for textural grouping and a computational framework that dynamically explains how textural elements are grouped, in real-time, into easily separated figures and ground.
One manifestation of this gap in contemporary understanding can be found in the image-processing models that have been developed by workers in artificial intelligence. In this approach, curves are analyzed using models different from those that are used to analyze textures, and textures are analyzed using models different from the ones used to analyze surfaces. All of these models are built up using geometrical ideas--such as surface normal, curvature, and Laplacian--that were used to study visual perception during the 19th century. These geometrical ideas were originally developed to analyze local properties of physical processes. By contrast, the visual system's context-sensitive mechanisms routinely synthesize figural percepts that are not reducible to local luminance differences within a scenic image. Such emergent properties are not just the effect of local geometrical transformations.