1. Field of the Invention
The present invention is a method and system for automatically detecting predefined events based on the behavior of people in a first video stream from a first means for capturing images in a physical space, accessing a synchronized second video stream from a second means for capturing images that are positioned to observe the people more closely using the timestamps associated with the detected events from the first video stream, and enabling an annotator to annotate each of the events with more labels using an annotation tool.
2. Background of the Invention
Event Detection based on Shoppers' Behavior Analysis
There have been earlier attempts for event detection based on customers' behaviors in a video.
U.S. Pat. Appl. Pub. No. 2003/0058339 of Trajkovic, et al. (hereinafter Trajkovic) disclosed a method for detecting an event through repetitive patterns of human behavior. Trajkovic learned multi-dimensional feature data from the repetitive patterns of human behavior and computed a probability density function (PDF) from the data. Then, a method for the PDF analysis, such as Gaussian or clustering techniques, was used to identify the repetitive patterns of behavior and unusual behavior through the variance of the Gaussian distribution or cluster.
Although Trajkovic can model a repetitive behavior through the PDF analysis, Trajkovic are clearly foreign to the event detection for the aggregate of non-repetitive behaviors, such as the shopper traffic in a physical space. Trajkovic did not disclose the challenges in the event detection based on customers' behaviors in a video in a retail environment, such as the non-repetitive behaviors. Therefore, Trajkovic are clearly foreign to the challenges that can be found in a retail environment.
U.S. Pat. Appl. Pub. No. 2006/0053342 of Bazakos, et al. (hereinafter Bazakos) disclosed a method for unsupervised learning of events in a video. Bazakos disclosed a method of creating a feature vector of a related object in a video by grouping clusters of points together within a feature space and storing the feature vector in an event library. Then, the behavioral analysis engine in Bazakos determined whether an event had occurred by comparing features contained within a feature vector in a specific instance against the feature vectors in the event library. Bazakos are primarily related to surveillance, rather than event detection based on customers' behaviors in a video.
U.S. Pat. Appl. Pub. No. 2005/0286774 of Porikli disclosed a method for event detection in a video using approximate estimates of the aggregated affinity matrix and clustering and scoring of the matrix. Porikli constructed the affinity matrix based on a set of frame-based and object-based statistical features, such as trajectories, histograms, and Hidden Markov Models of feature speed, orientation, location, size, and aspect ratio, extracted from the video.
Shoppers' Behavior Analysis
There have been earlier attempts for understanding customers' shopping behaviors captured in a video in a targeted environment, such as in a retail store, using cameras.
U.S. Pat. Appl. Pub. No. 2006/0010028 of Sorensen (hereinafter Sorensen 1) disclosed a method for tracking shopper movements and behavior in a shopping environment using a video. In Sorensen 1, a user indicated a series of screen locations in a display at which the shopper appeared in the video, and the series of screen locations were translated to store map coordinates.
The step of receiving the user input via input devices, such as a pointing device or keyboard, makes Sorensen 1 inefficient for handling a large amount of video data in a large shopping environment with a relatively complicated store layout, especially over a long period of time. The manual input by a human operator/user cannot efficiently track all of the shoppers in such cases, partially due to the possibility of human errors caused by tiredness and boredom. The manual input approach is also much less scalable as the number of shopping environments to handle for the behavior analysis increases. Therefore, an automated event detection approach is needed. The present invention utilizes an automated event detection approach for detecting predefined events from the customers' shopping interaction in a physical space.
Although U.S. Pat. Appl. Pub. No. 2002/0178085 of Sorensen, now U.S. Pat. No. 7,006,982, (hereinafter Sorensen 2) disclosed a usage of a tracking device and store sensors in a plurality of tracking systems primarily based on the wireless technology, such as the RFID, Sorensen 2 is clearly foreign to the concept of applying computer vision based tracking algorithms to the field of understanding customers' shopping behaviors and movements. In Sorensen 2, each transmitter was typically attached to a hand-held or push-type cart. Therefore, Sorensen 2 cannot distinguish the behaviors of multiple shoppers using one cart from the behavior of a single shopper also using one cart. Although Sorensen 2 disclosed that the transmitter may be attached directly to a shopper, via a clip or other form of customer surrogate in order to correctly track the shopper in the case when the person is shopping without a cart, this will not be practical due to the additionally introduced cumbersome step to the shopper, not to mention the inefficiency of managing the transmitter for each individual shopper.
The present invention can embrace any type of automatic wireless sensors for the detection of the predefined events. However, in a preferred embodiment, the present invention primarily utilizes the computer vision based automated approach for the detection of the predefined events. The computer vision based event detection helps the present invention to overcome the obstacles mentioned above.
With regard to the temporal behavior of customers, U.S. Pat. Appl. Pub. No. 2003/0002712 of Steenburgh, et al. (hereinafter Steenburgh) disclosed a relevant exemplary prior art. Steenburgh disclosed a method for measuring dwell time of an object, particularly a customer in a retail store, which enters and exits an environment, by tracking the object and matching the entry signature of the object to the exit signature of the object, in order to find out how long people spend in retail stores.
The modeling and analysis of activity of interest can be used as the exemplary way to detect predefined events.
U.S. Pat. Appl. Pub. No. 2002/0085092 of Choi, et al. (hereinafter Choi) disclosed a method for modeling an activity of a human body using optical flow vector from a video and probability distribution of the feature vectors from the optical flow vector. Choi modeled a plurality of states using the probability distribution of the feature vectors and expressed the activity based on the state transition.
Other Application Areas
There have been earlier attempts for activity analysis in various other areas than understanding customers' shopping behaviors, such as the surveillance and security applications. The following prior arts are not restricted to the application area for understanding customers' shopping behaviors in a physical space, but they disclosed methods for object activity modeling and analysis for the human body, using a video, in general.
Surveillance Application
U.S. Pat. Appl. Pub. No. 2003/0053659 of Pavlidis, et al. (hereinafter Pavlidis) disclosed a method for moving object assessment, including an object path of one or more moving objects in a search area, using a plurality of imaging devices and segmentation by background subtraction. In Pavlidis, the term “object” included customers, and Pavlidis also included itinerary statistics of customers in a department store. However, Pavlidis was primarily related to monitoring a search area for surveillance.
U.S. Pat. Appl. Pub. No. 2004/0113933 of Guler disclosed a method for automatic detection of split and merge events from video streams in a surveillance environment. Guler considered split and merge behaviors as key common simple behavior components in order to analyze high level activities of interest in a surveillance application, which are also used to understand the relationships among multiple objects not just individual behavior. Guler used adaptive background subtraction to detect the objects in a video scene, and the objects were tracked to identify the split and merge behaviors. To understand the split and merge behavior-based high level events, Guler used a Hidden Markov Model (HMM).
U.S. Pat. Appl. Pub. No. 2004/0120581 of Ozer, et al. (hereinafter Ozer) disclosed a method for identifying activity of customers for a marketing purpose or activity of objects in a surveillance area, by comparing the detected objects with the graphs from a database. Ozer tracked the movement of different object parts and combined them to high-level activity semantics, using several Hidden Markov Models (HMMs) and a distance classifier.
Transaction Application
U.S. Pat. No. 6,741,973 of Dove, et al. (hereinafter Dove) disclosed a model of generating customer behavior in a transaction environment. Although Dove disclosed video cameras in a real bank branch as a way to observe the human behavior, Dove are clearly foreign to the concept of automatic event detection based on the customers' behaviors on visual information of the customers in other types of physical space, such as the shopping path tracking and analysis in a retail environment, for the sake of annotating the customers' behaviors.
Computer vision algorithms have been shown to be an effective means for detecting and tracking people. These algorithms also have been shown to be effective in analyzing the behavior of people in the view of the means for capturing images. This allows the possibility of connecting the visual information from a scene to the behavior analysis of customers and predefined event detection.
Therefore, it is an objective of the present invention to provide a novel approach for annotating the customers' behaviors utilizing the information from the automatic behavior analysis of customers and predefined event detection. Any reliable automatic behavior analysis in the prior art may be used for the predefined event detection in the present invention. However, it is another objective of the present invention to provide a novel solution that solves the aforementioned problems found in the prior arts for the automatic event detection, such as the cumbersome attachment of devices to the customers, by automatically and unobtrusively analyzing the customers' behaviors without involving any hassle of requiring the customers to carry any cumbersome device.
Demographics
Computer vision algorithms have been shown to be an effective means for analyzing the demographic information of people in the view of the means for capturing images. Thus, there have been prior attempts for recognizing the demographic category of a person by processing the facial image using various approaches in the computer vision technologies, such as a machine learning approach.
U.S. Pat. No. 6,990,217 of Moghaddam, et al. (hereinafter Moghaddam) disclosed a method to employ Support Vector Machine to classify images of faces according to gender by training the images, including images of male and female faces; determining a plurality of support vectors from the training images for identifying a hyperplane for the gender decision; and reducing the resolution of the training images and the test image by sub-sampling before supplying the images to the Support Vector Machine.
U.S. Pat. Appl. Pub. No. 20030110038 of Sharma, et al. (hereinafter Sharma 20030110038) disclosed a computer software system for multi-modal human gender classification, comprising: a first-mode classifier classifying first-mode data pertaining to male and female subjects according to gender, and rendering a first-mode gender-decision for each male and female subject; a second-mode classifier classifying second-mode data pertaining to male and female subjects according to gender, and rendering a second-mode gender-decision for each male and female subject; and a fusion classifier integrating the individual gender decisions obtained from said first-mode classifier and said second-mode classifier, and outputting a joint gender decision for each of said male and female subjects.
Moghaddam and Sharma 20030110038, for demographics classification mentioned above, aim to classify a certain class of demographics profile, such as for gender only, based on the image signature of faces. U.S. Provisional Pat. No. 60/808,283 of Sharma, et al. (hereinafter Sharma 60/808,283) is a much more comprehensive solution, where the automated system captures video frames, detects customer faces in the frames, tracks the faces individually, corrects the pose the faces, and finally classifies the demographics profiles of the customers—both of the gender and the ethnicity. In Sharma 60/808,283, the face tracking algorithm has been designed and tuned to improve the classification accuracy; the facial geometry correction step improves both the tracking and the individual face classification accuracy, and the tracking further improves the accuracy of the classification of gender and ethnicity over the course of visibly tracked faces by combining the individual face classification scores.
Therefore, it is another objective of the present invention to detect the predefined events based on the demographic information of people in another exemplary embodiment. The invention automatically and unobtrusively analyzes the customers' demographic information without involving any hassle to customers or operators of feeding the information manually, utilizing the novel demographic analysis approaches in the prior arts.
The present invention utilizes the event detection by the automatic behavior analysis and demographic analysis in a first video stream to synchronize the same event in another second video stream and allows an annotator to annotate the synchronized event through an annotation tool. The manual annotation data in the present invention can be used for various market analysis applications, such as measuring deeper insights for customers' shopping behavior analysis in a retail store, media effectiveness measurement, and traffic analysis.