The present invention relates to methods and apparatus for computer vision. In particular, it relates to methods and systems for computer reasoning to detect an object in an image.
Reliably detecting patterns in visual data has been the primary goal of computer vision research for several years. Such patterns could be strictly spatial in nature, like static images of pedestrians, bicycles, airplanes etc., or they could be spatio-temporal in nature, like patterns of human or vehicular activity over time. Complex patterns tend to be compositional and hierarchical—a human can be thought to be composed of head, torso and limbs—a head to be composed of hair and face—a face to be composed of eyes, nose, mouth. Such patterns also tend to be challenging to detect, robustly as a whole, due to high degree of variability in shape, appearance, partial occlusions, articulation, and image noise among other factors. While the computer vision community has made significant headway in designing fairly robust low level, local feature detectors, such feature detectors only serve to detect parts of the larger pattern to be detected. Combining the detections of parts into a context sensitive, constraint satisfying set of pattern hypotheses is a non-trivial task. The key questions we will answer are how to represent knowledge of what the pattern looks like in a hierarchical, compositional manner and how this knowledge can be exploited to effectively search for the presence of the patterns of interest.
Predicate logic based reasoning approaches provide a means of formally specifying domain knowledge and manipulating symbolic information to explicitly reason about the presence of different patterns of interest. Such logic programs help easily model hierarchical, compositional patterns to combine contextual information with the detection of low level parts via conjunctions, disjunctions and different kinds of negations. First order predicate logic separates out the name, property or type of a logical construct from its associated parameters and further, via the use of existential and universal quantifiers, allows for enumeration over its parameters.
This provides for a powerful language that can be used to specify pattern grammars to parse a set of image features to detect the presence of the pattern of interest. Such pattern grammars encode constraints about the presence/absence of predefined parts in the image, geometric relations over their parameters, interactions between these parts and scene context, and search for solutions best satisfying those constraints. Additionally, it is straightforward to generate proofs or justifications, in the form of parse trees, for the final solution thus permitting direct analysis of the final solution in a linguistic form.
While formal reasoning approaches have long been used in automated theorem proving, constraint satisfaction and computational artificial intelligence, historically, their use in the field of computer vision has remained limited. In addition to the ability to specify constraints and search for patterns satisfying those constraints, it is important to evaluate the quality of the solution as a function of the observation and model uncertainty. One of the primary inhibiting factors to a successful integration of computer vision and first order predicate logic has been the design of an appropriate interface between the binary-valued logic and probabilistic vision output. Bilattices, algebraic structures introduced by “Ginsberg, M. L.: Multivalued logics: Uniform approach to inference in artificial intelligence. Comput. Intelligence (1988)”, provide a means to design exactly such an interface to model uncertainties for logical reasoning.
Unlike traditional logics, predicate logics extended using the bilattice-based uncertainty handling formalism, associate uncertainties with both logical rules (denoting degree of confidence in domain knowledge) and observed logical facts (denoting degree of confidence in observation). These uncertainties are taken from, and semantically interpreted within, a set structured as a bilattice. Modeling uncertainties in the bilattice facilitates independent representation of both positive and negative constraints about a proposition and furthermore provides tolerance for contradictory data inherent in many real-world applications. Performing inference in such a framework is also, typically, computationally efficient.
The predicate logic based approach extended using the bilattice formalism can therefore be used to encode pattern grammars to detect whether or not a specific pattern exists in an image, where in the image the pattern exists (via instantiated parameters of the predicates), why the system thinks the pattern exists (via proofs) and finally how strongly it thinks the pattern exists (final inferred uncertainty). Due to these characteristics, bilattice based logical reasoning frameworks appear to be promising candidates for use in time-sensitive, resource-bound, computer vision applications. In “Shet, V., Harwood, D., Davis, L.: Multivalued default logic for identity maintenance in visual surveillance. In: ECCV, pp. IV: 119-132 (2006)” and in “Shet, V., Neumann, J., Ramesh, V., Davis, L.: Bilattice-based logical reasoning for human detection. In: CVPR (2007)” it has been shown the applicability of such a formalism in computer vision problems such as activity recognition, identity maintenance and human detection. Arieli et al. in “Arieli, O., Cornelis, C., Deschrijver, G.: Preference modeling by rectangular bilattices. Proc. 3rd International Conference on Modeling Decisions for Artificial Intelligence (MDAI'06) (3885), 22-33 (2006)” have applied such frameworks in machine learning for preference modeling applications. Theoretical aspects of these frameworks have been studied by “Arieli, O., Cornelis, C., Deschrijver, G., Kerre, E.: Bilattice-based squares and triangles. Symbolic and Quantitative Approaches to Reasoning with Uncertainty pp. 563-575 (2005)”, “Fitting, M. C.: Bilattices in logic programming. In: 20th International Symposium on Multiple-Valued Logic, Charlotte, pp. 238-247. IEEE CS Press, Los Alamitos (1990)” and “Ginsberg, M. L.: Multivalued logics: Uniform approach to inference in artificial intelligence. Comput. Intelligence (1988).”
Accordingly, novel and improved apparatus and methods are required to represent knowledge of what an image pattern looks like in a hierarchical, compositional manner and to apply such represented knowledge to effectively search for the presence of the patterns of interest in an image.