1. Field of Invention
The invention is a new method and apparatus to detect the presence of articulated objects, e.g. human body, and rigid objects and to identify their activities in compressed and uncompressed domains and in real-time. The invention is used in a multiple camera system that is designed for use in indoor and outdoor environments. Possible applications of the invention are applications in law enforcement e.g. security checkpoints, home security, public places, experimental social sciences, entertainment e.g. virtual rooms, smart rooms, in monitoring e.g. interior of plane, car, train or in monitoring outdoor environments e.g. streets, bus stops, road-sides, etc.
2. Description of Related Art
Background of the Invention
Recent advances in camera and storage systems are main factors driving the increased popularity of video surveillance. Prices continue to drop on components e.g. CMOS cameras while manufacturers have added more features. Furthermore, the evolution of digital video especially in digital video storage and retrieval systems is another leading factor. Besides the expensive surveillance systems, today's PC-based, easy plug-in surveillance systems are directed at home users and small business owners who cannot afford the expense of investing thousand of dollars for a security system. Real time monitoring from anywhere, anytime enable keeping a watchful eye on security areas, offices, stores, houses, pools or parking garages.
Although these surveillance systems are powerful with new advances in camera and storage systems, automatic information retrieval from the sequences, e.g. rigid and non-rigid object detection and activity recognition in compressed and uncompressed domains, is not mature yet. These topics are still open areas for many research groups in industry, government, and academy.
Early activity recognition systems used beacons carried by the subjects. However, a system that uses video avoids the need for beacons and allows the system to recognize activities that can be used to command the operation of the environment.
As described in patents entitled “Method and Apparatus for real-time gesture recognition” by Katerina H. Nguyen, U.S. Pat. Nos. 6,072,494 and 6,256,033, a gesture recognition system is invented that compares the input gesture of the subject e.g. human figure with the known gestures in the database. Unlike the invention described herein, this approach is not modular as it recognizes the gesture of the whole human body figure. The same gesture, e.g. arm flapping can be performed by different subjects, birds, human, etc., where the subject of interest is not identified by the system. Another drawback of such a system is that it can easily fail when the subject figure is occluded.
As described in a patent entitled “Method and Apparatus for Detecting and Quantifying Motion of a Body Part”, U.S. Pat. No. 5,148,477, a system for body part motion is invented. Unlike the invention described herein, this approach is adapted to analyze facial movement, e.g. movement of eyebrows. The system does not classify different body parts of the human, it assumes that the object of interest is face. Unlike the system described herein, the system is purely dependent on the pixel change between two frames without using any classification and recognition information and any high level semantics.
U.S. Pat. No. 6,249,606 describes a system for computer input using a cursor device in which gestures made by a person controlling the cursor are recognized. In contrast, our system is not limited to use with a cursor device or to computer input applications. U.S. Pat. Nos. 6,222,465, 6,147,678, 6,204,852, and 5,454,043 describe computer input systems that recognize hand gestures; in contrast, our system is not limited to computer control of a virtual environment or to hand gestures. U.S. Pat. Nos. 6,057,845 and 5,796,406 are also directed to computer input devices and not the more general case of activity analysis solved by our invention.
As described in the patent application entitled “Method of detecting and tracking groups of people” by Myron D. Flickner, U.S. patent application No. 20030107649, a human tracking and detection system is invented that compares objects to “silhouette” templates to identify human and then uses tracking algorithm to determine the trajectory of people. This system does not try to understand the activity of the people, nor does it try to find the human-object interaction as our invention can do.
As described in a paper delivered at the Workshop on Artificial Intelligence for Web Search 2000 entitled “Visual Event Classification via Force Dynamics” authored by Siskind, a system, which classifies simple motion events, e.g. pick up and put down by using single camera input is presented. The system uses “force-dynamic” relations to distinguish between event types. A human hand performs pick-up and put-down gesture. The system works for stable background and colored objects. However, the system doesn't identify hand or other objects in the scene.
As reported in the IEEE Computer Vision and Pattern Recognition Proceedings 1997, entitled “Coupled Hidden Markov Models (HMM) for Complex Action Recognition” by Matthew Brand, Nuria Oliver, and Alex Pentland, a hand gesture recognition system is described. The system recognizes certain Chinese martial art movements. However, the hands are assumed to be recognized a-priori. The system doesn't detect and classify hands before gesture recognition step. The movement of one hand depends on the movement of the second hand, where freedom of motion of the hands is limited by the martial art movements.
Parameterized-HMM, as reported in IEEE Transactions on Pattern Recognition and Machine Intelligence, Volume 21, No 9, Sep. 1999, entitled, “Parametric Hidden Markov Models for Gesture Recognition” authored by Wilson and Bobick, can recognize complex events e.g. an interaction of two mobile objects, gestures made with two hands (e.g. so big, so small), etc. One of the drawbacks of the parameterized HMM is that for complex events (e.g. a combination of sub-events) parameter training space may become very large.
In summary, most of the activity recognition systems are suitable for a specific application type. The invention described herein can detect a wide range of activities for different applications. For this reason, the scheme detects different object parts and their movement in order to combine them at a later stage that connects to high-level semantics. Each object part has its own freedom of motion and the activity recognition for each part is achieved by using several HMMs in parallel.