1. Field of the Invention
The present invention is a method and system for segmenting a plurality of persons in a physical space based on automatic behavior analysis of the persons in a preferred embodiment, and in another exemplary embodiment, the present invention can also utilize other types of visual characterization, such as demographic analysis, or additional input sources, such as sales data, to segment the plurality of persons. The processes in the present invention are based on a novel usage of a plurality of computer vision technologies to analyze the behavior and visual characterization of the persons from the plurality of input images.
2. Background of the Invention
Behavior Analysis
There have been earlier attempts for understanding people's behaviors, such as customers' shopping behaviors, captured in a video in a targeted environment, such as in a retail store, using cameras.
U.S. Pat. Appl. Pub. No. 2006/0010028 of Sorensen (hereinafter Sorensen 1) disclosed a method for tracking shopper movements and behavior in a shopping environment using a video. In Sorensen 1, a user indicated a series of screen locations in a display at which the shopper appeared in the video, and the series of screen locations were translated to store map coordinates.
The step of receiving the user input via input devices, such as a pointing device or keyboard, makes Sorensen 1 inefficient for handling a large amount of video data in a large shopping environment with a relatively complicated store layout, especially over a long period of time. The manual input by a human operator/user cannot efficiently track all of the shoppers in such cases, partially due to the possibility of human errors caused by tiredness and boredom. The manual input approach is also much less scalable as the number of shopping environments to handle for the behavior analysis increases. Therefore, an automated behavior analysis approach is needed for segmenting a large amount of video data for people in a physical space.
Although U.S. Pat. No. 7,006,982, (hereinafter Sorensen 2) disclosed a usage of a tracking device and store sensors in a plurality of tracking systems primarily based on the wireless technology, such as the RFID, Sorensen 2 is clearly foreign to the concept of applying computer vision based tracking algorithms to the field of understanding customers' shopping behaviors and movements. In Sorensen 2, each transmitter was typically attached to a hand-held or push-type cart. Therefore, Sorensen 2 cannot distinguish the behaviors of multiple shoppers using one cart from the behavior of a single shopper also using one cart. Although Sorensen 2 disclosed that the transmitter may be attached directly to a shopper, via a clip or other form of customer surrogate in order to correctly track the shopper in the case when the person is shopping without a cart, this will not be practical due to the additionally introduced cumbersome step to the shopper, not to mention the inefficiency of managing the transmitter for each individual shopper.
Sorensen 2 cannot efficiently provide the exact path of a shopper since it is based on creating a computer simulated field of view for each shopper based on the direction of travel. Also, the shopping behavior cannot be deciphered accurately as it is again based on determining the products that lie within the simulated field of view of the shoppers, and could result in incorrect judgments. On the contrary, the proprietary computer vision based technology in the present invention can automatically track people and their behaviors at a detailed interaction level in a physical space without using any simulation or approximation techniques, thus providing efficient behavior analysis information.
With regard to the temporal behavior of customers, U.S. Pat. Appl. Pub. No. 2003/0002712 of Steenburgh, et al. (hereinafter Steenburgh) disclosed a relevant prior art. Steenburgh disclosed a method for measuring dwell time of an object, particularly a customer in a retail store, which enters and exits an environment, by tracking the object and matching the entry signature of the object to the exit signature of the object, in order to find out how long people spend in retail stores. The method in Steenburgh can be used as one of the many methods to measure the dwell time of people in a physical space as one of the segmentation criteria.
Event Detection Based on Behavior Analysis
Event can be used as a way to decide the segmentation of the people attached to the event in the segmentation criteria. There have been earlier attempts for event detection based on customers' behaviors in a video.
U.S. Pat. Appl. Pub. No. 2003/0058339 of Trajkovic, et al. (hereinafter Trajkovic) disclosed a method for detecting an event through repetitive patterns of human behavior. Trajkovic learned multi-dimensional feature data from the repetitive patterns of human behavior and computed a probability density function (PDF) from the data. Then, a method for the PDF analysis, such as Gaussian or clustering techniques, was used to identify the repetitive patterns of behavior and unusual behavior through the variance of the Gaussian distribution or cluster.
Although Trajkovic can model a repetitive behavior through the PDF analysis, Trajkovic are clearly foreign to the event detection for the aggregate of non-repetitive behaviors, such as the shopper traffic in a physical space. Trajkovic did not disclose the challenges in the event detection based on customers' behaviors in a video in a retail environment, such as the non-repetitive behaviors. Therefore, Trajkovic are clearly foreign to the challenges that can be found in a retail environment.
U.S. Pat. Appl. Pub. No. 2006/0053342 of Bazakos, et al. (hereinafter Bazakos) disclosed a method for unsupervised learning of events in a video. Bazakos disclosed a method of creating a feature vector of a related object in a video by grouping clusters of points together within a feature space and storing the feature vector in an event library. Then, the behavioral analysis engine in Bazakos determined whether an event had occurred by comparing features contained within a feature vector in a specific instance against the feature vectors in the event library. Bazakos are primarily related to surveillance, rather than event detection based on customers' behaviors in a video.
U.S. Pat. Appl. Pub. No. 2005/0286774 of Porikli disclosed a method for event detection in a video using approximate estimates of the aggregated affinity matrix and clustering and scoring of the matrix. Porikli constructed the affinity matrix based on a set of frame-based and object-based statistical features, such as trajectories, histograms, and Hidden Markov Models of feature speed, orientation, location, size, and aspect ratio, extracted from the video.
Other Application Areas
There have been earlier attempts for activity analysis in various other areas than understanding customers' shopping behaviors, such as the surveillance and security applications. The following prior arts are not restricted to the application area for understanding customers' shopping behaviors in a physical space, but they disclosed methods for object activity modeling and analysis for the human body, using a video, in general.
U.S. Pat. Appl. Pub. No. 2003/0053659 of Pavlidis, et al. (hereinafter Pavlidis) disclosed a method for moving object assessment, including an object path of one or more moving objects in a search area, using a plurality of imaging devices and segmentation by background subtraction. In Pavlidis, the term “object” included customers, and Pavlidis also included itinerary statistics of customers in a department store. However, Pavlidis was primarily related to monitoring a search area for surveillance.
U.S. Pat. Appl. Pub. No. 2004/0113933 of Guler disclosed a method for automatic detection of split and merge events from video streams in a surveillance environment. Guler considered split and merge behaviors as key common simple behavior components in order to analyze high level activities of interest in a surveillance application, which are also used to understand the relationships among multiple objects, not just individual behavior. Guler used adaptive background subtraction to detect the objects in a video scene, and the objects were tracked to identify the split and merge behaviors. To understand the split and merge behavior-based high level events, Guler used a Hidden Markov Model (HMM).
U.S. Pat. Appl. Pub. No. 2004/0120581 of Ozer, et al. (hereinafter Ozer) disclosed a method for identifying activity of customers for a marketing purpose or activity of objects in a surveillance area, by comparing the detected objects with the graphs from a database. Ozer tracked the movement of different object parts and combined them to high level activity semantics, using several Hidden Markov Models (HMMs) and a distance classifier.
U.S. Pat. No. 6,741,973 of Dove, et al. (hereinafter Dove) disclosed a model of generating customer behavior in a transaction environment. Although Dove disclosed video cameras in a real bank branch as a way to observe the human behavior, Dove are clearly foreign to the concept of automatic event detection based on the customers' behaviors on visual information of the customers in other types of physical space, such as the shopping path tracking and analysis in a retail environment, for the sake of segmenting the customers based on their behaviors.
Computer vision algorithms have been shown to be an effective means for detecting and tracking people. These algorithms also have been shown to be effective in analyzing the behavior of people in the view of the means for capturing images. This allows the possibility of connecting the visual information from a scene to the segmentation of the people based on the behavior analysis and predefined event detection based on the behavior.
Therefore, it is an objective of the present invention to provide a novel approach for segmenting the people based on their behaviors utilizing the information from the automatic behavior analysis and predefined event detection. Any reliable automatic behavior analysis in the prior art may be used for the predefined event detection that will trigger the segmentation of the people in the present invention. However, it is another objective of the present invention to provide a novel solution that solves the aforementioned problems found in the prior arts for the automatic event detection, such as the cumbersome attachment of devices to the customers, by automatically and unobtrusively analyzing the customers' behaviors without involving any hassle of requiring the customers to carry any cumbersome device.
Demographics Analysis
Computer vision algorithms have also been shown to be an effective means for analyzing the demographic information of people in the view of the means for capturing images. Thus, the present invention also utilizes the technological advantage to the segmentation of the people in another exemplary embodiment. There have been prior attempts for recognizing the demographic category of a person by processing the facial image using various approaches in the computer vision technologies, such as a machine learning approach.
U.S. Pat. No. 6,990,217 of Moghaddam, et al. (hereinafter Moghaddam) disclosed a method to employ Support Vector Machine to classify images of faces according to gender by training the images, including images of male and female faces; determining a plurality of support vectors from the training images for identifying a hyperplane for the gender decision; and reducing the resolution of the training images and the test image by sub-sampling before supplying the images to the Support Vector Machine.
U.S. Pat. Appl. Pub. No. 20030110038 of Sharma, et al. (hereinafter Sharma 20030110038) disclosed a computer software system for multi-modal human gender classification, comprising: a first-mode classifier classifying first-mode data pertaining to male and female subjects according to gender, and rendering a first-mode gender-decision for each male and female subject; a second-mode classifier classifying second-mode data pertaining to male and female subjects according to gender, and rendering a second-mode gender-decision for each male and female subject; and a fusion classifier integrating the individual gender decisions obtained from said first-mode classifier and said second-mode classifier, and outputting a joint gender decision for each of said male and female subjects.
Moghaddam and Sharma 20030110038, for demographics classification mentioned above, aim to classify a certain class of demographics profile, such as for gender only, based on the image signature of faces. U.S. Provisional Pat. No. 60/808,283 of Sharma, et al. (hereinafter Sharma 60/808,283) is a much more comprehensive solution, where the automated system captures video frames, detects customer faces in the frames, tracks the faces individually, corrects the pose of the faces, and finally classifies the demographics profiles of the customers—both of the gender and the ethnicity. In Sharma 60/808,283, the face tracking algorithm has been designed and tuned to improve the classification accuracy; the facial geometry correction step improves both the tracking and the individual face classification accuracy, and the tracking further improves the accuracy of the classification of gender and ethnicity over the course of visibly tracked faces by combining the individual face classification scores.
Therefore, it is another objective of the present invention to segment the people in a physical space based on the demographic information of people in another exemplary embodiment. The invention automatically and unobtrusively analyzes the customers' demographic information without involving any hassle to customers or operators of feeding the information manually, utilizing the novel demographic analysis approaches in the prior arts.
The present invention utilizes the automatic behavior analysis and demographic analysis in the video stream to segment the people based on segmentation criteria. The segmentation data in the present invention can be used for various market analysis applications, such as measuring deeper insights for customers' shopping behavior analysis in a retail store, media effectiveness measurement, and traffic analysis.
The prior arts above are foreign to the concept of segmenting the people based on the segmentation criteria for their behaviors by tracking and analyzing the movement information of the people in a physical space, such as a retail store. The present invention discloses a novel usage of computer vision technologies for efficiently segmenting the people based on their behaviors in a physical space by tracking and analyzing the movement information of the customers in regards to the segmentation criteria.