Due to increasing labor costs, as well as an inadequate number of qualified employee candidates, many retail businesses and other establishments must often operate with an insufficient number of employees. Thus, when there are not enough employees to perform every desired function, the management must prioritize responsibilities to ensure that the most important functions are satisfied, or find an alternate way to perform the function. For example, many retail establishments utilize automated theft detection systems to replace or supplement a security staff.
In addition, many businesses do not have enough employees to adequately monitor an entire store or other location, for example, for security purposes or to determine when a patron may require assistance. Thus, many businesses and other establishments position cameras at various locations to monitor the activities of patrons and employees. While the images generated by the cameras typically allow the various locations to be monitored by one person positioned at a central location, such a system nonetheless requires human monitoring to detect events of interest.
Thus, a number of computer vision monitoring and surveillance techniques have been proposed or suggested to automatically identify one or more predefined events in a sequence of images. Such events could include, for example, unauthorized personnel in an area, a queue that is too long, a door that is left open, or a patron requiring assistance.
Typically, computer vision systems accept an input image and compare the input image with a number of states. The image is assigned to a state when the input image sufficiently matches the state. Generally, matching is performed by comparing input image information with state image information from each of the states. The states are typically modeled using a number of known techniques, such as Hidden Markov Models, histograms, or clustering.
Complex events are defined recursively in terms of simpler events, using an event description language. A parsing module processes the stream of detected simpler events and recognizes complex events. Object trajectories have been analyzed to identify various dynamic events, such as a person entering or exiting a room or a person depositing an object. Simple motions, such as a person walking or running, can be learned and recognized from spatio-temporal motion templates. For example, probabilistic techniques, such as Hidden Markov models (HMMs) and Bayesian networks, have been used extensively to recognize complex motion patterns and to learn and recognize human activities.
While such event classification techniques perform effectively for some complex events, it has been observed that conventional event classification techniques do not perform well when the same event may be exhibited in various ways, especially in the presence of viewpoint changes or broad ranges of possible motion, such as when a person is falling. In addition, conventional event classification techniques do not consider the context of an event, to distinguish, for example, a person falling down to the floor as opposed to a person lying down into bed. A need therefore exists for an improved computer based method and apparatus for automatically identifying complex events in an image sequence.