In numerous classes of applications, including object detection and acceptance test applications, an important issue is to classify objects or make other decisions in real-time. For instance, security systems need to detect targets in real-time and to act on them. Robots need to quickly decide whether an observed artifact is an obstacle, in order to avoid collision. Factory inspection lines must decide as quickly as possible whether or not a manufactured object is faulty. This type of issue has been addressed in work in Machine Learning on the induction of cost-sensitive classifiers, and in particular cost-sensitive decision trees. In this line of research, the goal is to induce a classifier in which the expected cost of tests, and possibly misclassification costs, is minimized. See for example: Marlon N'u{tilde over ( )}nez, The use of background knowledge in decision tree induction, Machine Learning, 6: pages 231-250, 1991; Peter D. Turney, Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm, Journal of Artificial Intelligence Research (JAIR), 2: pages 369-409, 1995; and Charles X. Ling, Qiang Yang, Jianning Wang, and Shichao Zhang, Decision trees with minimal costs, in International Conference on Machine Learning (ICML), 2004. Most work in this area explores various heuristics and techniques for generating such trees, although Valentina Bayer Zubek and Thomas G. Dietterich, Pruning improves heuristic search for cost-sensitive learning, in ICML, pages 19-26, 2002, is an exception in which an optimal tree results from solving an appropriate Markov Decision Process (MDP). However, the size of the MDP is exponential in the number of attributes, and in general an optimal solution cannot be found.
In order to simplify the processing, the idea of a cascade system was proposed by Viola: Paul A. Viola and Michael J. Jones, Rapid object detection using a boosted cascade of simple features, in Computer Vision and Pattern Recognition (CVPR) (1), pages 511-518, 2001. An example cascade is illustrated in FIG. 1 (Prior art). A cascade system is composed of simple detectors 10, each computing one test. For each detector 10, a rejection threshold used for rejection 20 of non-object examples (also called “samples”) is learned offline. Difficult examples that pass through the entire sequence of filters arrive at the final stage where an Ada Boost classifier is used to classify those examples into objects or non-objects.
An important idea behind cascade architectures, as suggested by Viola is to introduce weak classifiers 30 that can classify (hereinafter referred to as “reject”) many examples quickly, thus saving considerable computation time, leaving the rest of the examples to be classified later on in the cascade.
Assuming that a rejection event is always correct, and that all detectors are in the cascade, the classification accuracy is independent of the ordering. Basically, In Viola's scheme, the detectors are ordered such that detectors with high reject probability are placed first, ignoring their runtime. When some detectors require a much larger runtime than others, this becomes problematic in that the resulting runtime is far from optimal.
There is therefore a need and it would be advantageous to have methods to optimize the runtime, preferably without impinging on overall classification accuracy.
Related art is described in Eric Horvitz and Jed Lengyel, Perception, attention, and resources: A decision-theoretic approach to graphics rendering, in Proceedings of UAI (Conference on Uncertainty in Artificial Intelligence), pages 238-249, August 1997. Methods exemplified by Horvitz et al consist of schemes for reasoning in order to get optimal expected reward, one special case being optimization of expected runtime. But in these methods considering stoppage of a test sequence when a reject is detected is not relevant and has thus not been considered.
There is therefore a need and it would be advantageous to have methods to optimize tests in a cascade that can detect “rejects” quickly and optimize the runtime of the tests in the cascade, preferably without impinging on overall classification accuracy.
In a cascade, some weak classifiers used in related art, compute features or classifiers as an intermediate computation, creating a structural dependency, which also entails an ordering constraint. Hence it is the intention of the present invention to consider both statistical dependencies and ordering constraints. It is a further intention of the present invention to provide a provably optimal ordering of tests for some important cases, and near-optimal orderings for the rest of the cases.
The term “ordering constraint,” as used with a cascade of tests, refers herein to a prerequisite constraint that a particular test in the cascade run before another particular test. The representation for the ordering constraints is as a partial order. A partial order can also be represented as a directed graph as a notational variant. Whenever A must appear before E this constraint is denoted by A→E or by “A before B”. Formally, an immediate successor of a test C is a test D, such that there exists no test Z with C→Z→D. In this case we also refer to C as an immediate prerequisite of D.
The term “statistical dependency,” as used with a cascade of tests, refers herein to the fact that the reject probability of a test may depend statistically on the results of previously run tests. In a set of tests X={x1, x2, . . . , xn} conditions under which statistical dependencies between the tests can be handled, are analyzed. Denote ri|S as the probability that test xi rejects given previous occurrences S, where typically S would be the reject and/or non-reject of previous tests. For example, ri|j denotes the probability that test xi does not reject given that xj has rejected.