1. Field of the Invention
The invention relates to video data processing systems and methods and more particularly to systems and methods of using computers for segmenting image plane according to the surface connectivity, and identifying areas of images taken by a moving camera according to the object surfaces wherefrom the areas of images are taken. The image segmentations and object based identifications thus constitute a topological surface representation of objects. The invention discloses a method and apparatus comprising a plurality of processing modules for extracting from images in a video sequence the occluding contours delineating images into regions in accordance with the spatial connectivity of the correspondent visible surfaces, and diffeomorphism relations between areas of images taken from different perspective centers for identifying image areas of different frames as of the surface of same object, and specifying the fold contours of the surfaces that owns the contour, and thus producing the surface representations from video images taken from persistent objects by a moving camera.
Many concepts and terminologies used in this invention description can be found in Gibson's book “The Ecological Approach to Visual Perception”.
2. Description of the Related Art
Techniques of using digital computers for extracting visual information from video data including automatic object and human detection, recognition and tracking, object and human motion pattern learning, purposeful motion detection and classification, anomaly detection and classification, are developed for different purposes. These are related arts. At the core of the application systems are the techniques of image segmentation and object tracking, the necessary steps that transform the sensory data streams composed of tremendous quantity of transient pixels into a much smaller number of stable discrete units upon them the object recognition, object (human) motion pattern recognition, and more complex spatial-temporal event analysis may further be performed. The intermediate representation established by ways of image segmentation and image tracking on one hand relates to sensory data, on the other hand it relates to higher level symbolic conceptual representation: objects, things, facts, and events.
Video image segmentation and object tracking techniques were studied in various university research institutes as well as commercial sector R&D branches. Numerous research articles and books on object segmentation and tracking were published. Current art of computational vision uses object appearance information such as pixel motion, optical flow, brightness, color, and texture to delineate image into discrete areas, and tracks them in video sequences. These methods are generally not referenced to surface representation. They are addressed in terms of segmentation of image patterns and tracking of image patterns.
The 3D reconstructions of visible scene were studied in computer vision community and algorithms and computer programs were developed and resulted from such studies. The purpose of these works was to specify (construct) depth maps of visible surfaces from binocular stereopsis, particularly through measurement of disparities of correspondent pixels in overlapped regions of a stereo pair of images. These methods were not aimed to extract surface topological information, particularly those for specifying the scopes of each individual visible surface of one object in a scene, and those for specifying spatial separations between them and the manner these visible surfaces spatially extended into occluded, and those for identifying visible surfaces seen from different perspectives via their partial overlaps in the 3D space.
Scientific data from vision research in recent years have shown that human vision system builds surface representation of objects at an early stage of visual perception. It is through the surface representation human vision system is able to have the general percepts of individual identifiable discrete persistent objects. Surfaces are first of all the topological objects. The existence of surface representation indicates human vision system is able to extract topological information of the physical constructions of its environment: ground, objects, other humans and animals, etc. In past decade, neurobiological data indicated that occluding contours are extracted in monkey's V2 area. Two sides of an occluding contour are images of spatially separated local surfaces. In monkey's vision system, the information of spatial continuation of a visible surface passing an occluding contour is coded in the form of the border ownership of the contour. Images of a surface of an object taken from different perspectives are related by perspective mappings. Patches of different images taken with a moving video camera are related by a perspective mapping representing a same object surface. The perspective mappings between images and occluding contours thus constitute the surface representation of the environment.
U.S. Pat. Nos. 5,535,302 and 5,911,035 and the article by Tsao, T. and Tsao D. “Lie group model neuromorphic geometric engine for real-time terrain reconstruction from stereoscopic aerial photos,” published in: Proceedings of SPIE—Volume 3077 Applications and Science of Artificial Neural Networks III, 1997, on pp. 535-544, described methods and apparatus for extracting image affine transformation, the first order Taylor approximation of perspective mappings, presents a technique for computing approximations of perspectives mappings of image patches taken from different perspectives.
Various image intensity edge detection algorithms and codes published in various computer vision journals and books and other sources are open to public.