1. Field of the Invention
The present invention relates to computer vision systems, more particularly to a system having computationally efficient real-time object detection, tracking, and zooming capabilities.
2. Description of Prior Art
Recent advancements in processing and sensing performances are facilitating increased development of real-time video surveillance and monitoring systems.
The development of computer vision systems that meet application specific computational and accuracy needs are important to the deployment of real-life computer vision systems. Such a computer vision system has not yet been realized.
Past works have addressed methodological issues and have demonstrated performance analysis of components and systems. However, it is still an art to engineer systems that meet given application needs in terms of computational speed and accuracy. The trend in the art is to emphasize statistical learning methods, more particularly Bayesian methods for solving computer vision problems. However, there still exists the problem of choosing the right statistical likelihood model and the right priors to suit the needs of an application. Moreover, it is still computationally difficult to satisfy real-time application needs.
Sequential decomposition of the total task into manageable sub-tasks (with reasonable computational complexity) and the introduction of pruning thresholds is one method to solve the problem. Yet, this introduces additional problems because of the difficulty in approximating the probability distributions of observables at the final step of the system so that Bayesian inference is plausible. This approach to perceptual Bayesian is described, for example, in V. Ramesh et al., “Computer Vision Performance Characterization,” RADIUS: Image Understanding for Imagery Intelligence, edited by, O. Firschein and T. Strat, Morgan Kaufmann Publishers, San Francisco, 1997, incorporated herein by reference, and W. Mann and T. Binford, “Probabilities for Bayesian Networks in Vision,” Proceedings of the ARPA IU Workshop, 1994, Vol. 1, pp. 633–643. The work done by Ramesh et al., places an emphasis on performance characterization of a system, while Mann and Binford attempted Bayesian inference (using Bayesian networks) for visual recognition. The idea of gradual pruning of candidate hypotheses to tame the computational complexity of the estimation/classification problem has been presented by Y. Amit and D. Geman, “A computational model for visual selection,” Neural Computation, 1999. However, none of the works identify how the sub-tasks (e.g., feature extraction steps) can be chosen automatically given an application context.
Therefore, a need exists for a method and apparatus for a computationally efficient, real-time camera surveillance system with defined computational and accuracy constraints.