1. Field of the Invention
The present invention is a method and system for automatically analyzing a category in a plurality of the categories in a physical space based on the visual characterization, such as behavior analysis or segmentation, of the persons with regard to the category, wherein the present invention captures a plurality of input images of the persons in the category by a plurality of means for capturing images and processes the plurality of input images in order to understand the shopping behavior of the persons with the sub-categories in the category and analyze the level of engagement and decision process at the sub-category level.
2. Background of the Invention
Shoppers' Behavior Analysis:
There have been earlier attempts for understanding customers' shopping behaviors captured in a video in a targeted environment, such as in a retail store, using cameras.
U.S. Pat. Appl. Pub. No. 2006/0010028 of Sorensen (hereinafter Sorensen 1) disclosed a method for tracking shopper movements and behavior in a shopping environment using a video. In Sorensen 1, a user indicated a series of screen locations in a display at which the shopper appeared in the video, and the series of screen locations were translated to store map coordinates. The step of receiving the user input via input devices, such as a pointing device or keyboard, makes Sorensen 1 inefficient for handling a large amount of video data in a large shopping environment with a relatively complicated store layout, especially over a long period of time. The manual input by a human operator/user cannot efficiently track all of the shoppers in such cases, not to mention the possibility of human errors due to tiredness and boredom. The manual input approach is also much less scalable as the number of shopping environments to handle for the behavior analysis increases.
Although U.S. Pat. Appl. Pub. No. 2002/0178085 of Sorensen, now U.S. Pat. No. 7,006,982, (hereinafter Sorensen 2) disclosed a usage of tracking device and store sensors in a plurality of tracking systems primarily based on the wireless technology, such as the RFID, Sorensen 2 is clearly foreign to the concept of applying computer vision based tracking algorithms to the field of understanding customers' shopping behaviors and movements. In Sorensen 2, each transmitter was typically attached to a hand-held or push-type cart. Therefore, Sorensen 2 cannot distinguish the behaviors of multiple shoppers using one cart from the behavior of a single shopper also using one cart. Although Sorensen 2 disclosed that the transmitter may be attached directly to a shopper via a clip or other form of customer surrogate in order to correctly track the shopper in the case when the person is shopping without a cart, this will not be practical due to the additionally introduced cumbersome step to the shopper, not to mention the inefficiency of managing the transmitter for each individual shopper.
Sorensen 2 cannot efficiently provide the exact path of a shopper since it is based on creating a computer-simulated field of view for each shopper based on the direction of travel. Also, the shopping behavior cannot be deciphered accurately as it is again based on determining the products that lie within the simulated field of view of the shoppers, and could result in incorrect judgments. On the contrary, the proprietary computer vision based technology in the present invention automatically tracks shoppers and their behaviors at the category level in the retail space without using any simulation or approximation techniques, thus providing efficient shopper behavior information.
U.S. Pat. No. 6,741,973 of Dove et al. (hereinafter Dove) disclosed a model of generating customer behavior in a transaction environment. Although Dove disclosed video cameras in a real bank branch as a way to observe the human behavior, Dove is clearly foreign to the concept of automatic and real-time analysis of the customers' behaviors, based on visual information of the customers in a retail environment, such as the shopping path tracking and analysis.
With regard to the temporal behavior of customers, U.S. Pat. Appl. Pub. No. 2003/0002712 of Steenburgh, et al. (hereinafter Steenburgh) disclosed a relevant prior art. Steenburgh disclosed a method for measuring dwell time of an object, particularly a customer in a retail store, which enters and exits an environment, by tracking the object and matching the entry signature of the object to the exit signature of the object, in order to find out how long people spend in retail stores.
U.S. Pat. Appl. Pub. No. 2003/0053659 of Pavlidis, et al. (hereinafter Pavlidis) disclosed a method for moving object assessment, including an object path of one or more moving objects in a search area, using a plurality of imaging devices and segmentation by background subtraction. In Pavlidis, the object included customers. Pavlidis was primarily related to monitoring a search area for surveillance, but Pavlidis also included itinerary statistics of customers in a department store.
U.S. Pat. Appl. Pub. No. 2004/0120581 of Ozer, et al. (hereinafter Ozer) disclosed a method for identifying activity of customers for marketing purpose or activity of objects in a surveillance area, by comparing the detected objects with the graphs from a database. Ozer tracked the movement of different object parts and combined them to high-level activity semantics, using several Hidden Markov Models (HMMs) and a distance classifier.
U.S. Pat. Appl. Pub. No. 2004/0131254 of Liang, et al. (hereinafter Liang) also disclosed the Hidden Markov Models (HMMs) as a way, along with the rule-based label analysis and the token parsing procedure, to characterize behavior in their disclosure. Liang disclosed a method for monitoring and classifying actions of various objects in a video, using background subtraction for object detection and tracking. Liang is particularly related to animal behavior in a lab for testing drugs.
There have been earlier attempts for activity analysis in various other areas than understanding customers' shopping behaviors, such as the surveillance and security applications.
Object Activity Modeling and Analysis:
The following prior arts are not restricted to the application area for understanding customers' shopping behaviors in a targeted environment, but they disclosed methods for object activity modeling and analysis for human body, using a video, in general.
U.S. Pat. Appl. Pub. No. 2002/0085092 of Choi, et al. (hereinafter Choi) disclosed a method for modeling an activity of a human body using optical flow vector from a video and probability distribution of the feature vectors from the optical flow vector. Choi modeled a plurality of states using the probability distribution of the feature vectors and expressed the activity based on the state transition.
U.S. Pat. Appl. Pub. No. 2004/0113933 of Guler disclosed a method for automatic detection of split and merge events from video streams in a surveillance environment. Guler considered split and merge behaviors as key common simple behavior components in order to analyze high level activities of interest in a surveillance application: which are also used to understand the relationships among multiple objects not just individual behavior. Guler used adaptive background subtraction to detect the objects in a video scene and the objects were tracked to identify the split and merge behaviors. To understand the split and merge behavior-based high level events, Guler used a Hidden Markov Model (HMM).
Event Detection based on Shoppers' Behavior Analysis:
There have been earlier attempts for event detection based on customers' behaviors in a video.
U.S. Pat. Appl. Pub. No. 2003/0058339 of Trajkovic, et al. (hereinafter Trajkovic) disclosed a method for detecting an event through repetitive patterns of human behavior. Trajkovic learned multi-dimensional feature data from the repetitive patterns of human behavior and computed a probability density function (PDF) from the data. Then, a method for the PDF analysis, such as Gaussian or clustering techniques, was used to identify the repetitive patterns of behavior and unusual behavior through the variance of the Gaussian distribution or cluster.
Although Trajkovic can model a repetitive behavior through the PDF analysis, Trajkovic is clearly foreign to the event detection for the aggregate of non-repetitive behaviors, such as the shopper traffic in a category of a physical space. The shopping path of an individual shopper can be repetitive, but each shopping path in a group of aggregated shopping paths of multiple shoppers is not repetitive. Trajkovic did not disclose the challenges in the event detection based on customers' behaviors in a video in a retail environment such as this, and Trajkovic is clearly foreign to the challenges that can be found in a retail environment.
U.S. Pat. Appl. Pub. No. 2006/0053342 of Bazakos, et al. (hereinafter Bazakos) disclosed a method for unsupervised learning of events in a video. Bazakos disclosed a method of creating a feature vector of a related object in a video by grouping clusters of points together within a feature space and storing the feature vector in an event library. Then, the behavioral analysis engine in Bazakos determined whether an event had occurred by comparing features contained within a feature vector in a specific instance against the feature vectors in the event library. Bazakos is primarily related to surveillance rather than event detection based on customers' behaviors in a video.
U.S. Pat. Appl. Pub. No. 2005/0286774 of Porikli disclosed a method for event detection in a video using approximate estimates of the aggregated affinity matrix and clustering and scoring of the matrix. Porikli constructed the affinity matrix based on a set of frame-based and object-based statistical features, such as trajectories, histograms, and Hidden Markov Models of feature speed, orientation, location, size, and aspect ratio, extracted from the video.
The prior arts above are foreign to the concept of understanding customers' shopping behaviors, by tracking and analyzing the movement information of the customers, in regards to sub-categories of a “category” in a physical space, such as a retail store. Category is defined as a logical entity for a group of products, a group of product types, space, areas in a store, display of a group of products, or department with similar relevance in the present invention. The present invention discloses a novel usage of computer vision technologies for more efficiently understanding the shoppers' behaviors in a category of a physical space, such as a retail space, by tracking and analyzing the movement information of the customers in regards to the sub-categories of the category. The present invention also discloses a novel approach of analyzing the category based on the automated measurement of the shoppers' behaviors in regards to the sub-categories.