1. Field of the Invention
The present invention is a method and system for characterizing physical space based on automatic demographics measurement, using a plurality of means for capturing images and a plurality of computer vision technologies, such as face detection, person tracking, body parts detection, and demographic classification of the people, on the captured visual information of the people in the physical space.
2. Background of the Invention
Media and Product Match
There have been attempts to customize and distribute matching media content, such as advertising content, to customers based on customer profiles, demographic information, or customer purchase history from the customer in the prior art.
U.S. Pat. No. 6,119,098 of Guyot, et al. (hereinafter Guyot) disclosed a method and apparatus for targeting and distributing advertisements over a distributed network, such as the Internet, to the subscriber's computer. The targeted advertisements were based on a personal profile provided by the subscriber. Guyot was primarily intended for the subscriber with a computer at home, not at a physical space, such as a retail store or a public place, and the targeted advertisement creation relied on the non-automatic response from the customer. U.S. Pat. No. 6,182,050 of Ballard disclosed a method and apparatus for distributing advertisements online using target criteria screening, which also provided a method for maintaining end user privacy. In the disclosure, the demographic information or a desired affinity ranking was gathered by the end user, who completed a demographic questionnaire and ranked various categories of products and services. Like Guyot, Ballard is foreign to the concept of automatically gathering the demographic information from the customers without requiring any cumbersome response from the end user in a physical space, such as a retail store.
U.S. Pat. No. 6,055,573 of Gardenswartz, et al. and its continuation U.S. Pat. No. 6,298,330 of Gardenswartz, et al. (hereinafter Gardenswartz) disclosed a method and apparatus for communicating with a computer in a network based on the offline purchase history of a particular customer. Gardenswartz included the delivery of a promotional incentive for a customer to comply with a particular behavioral pattern. In Gardenswartz, the customer has to supply the registration server with information about the customer, including demographics of the customer, to generate an online profile. Gardenswartz clearly lacks the feature of automatically gathering the demographic information.
U.S. Pat. No. 6,847,969 of Mathai, et al. (hereinafter Mathai) disclosed a method and system for providing personalized advertisements to customers in a public place. In Mathai, the customer inserts a personal system access card into a slot on a terminal, which automatically updates the customer profile based on the customer's usage history. The customer profile is used for targeted advertising in Mathai. However, the usage of a system access card is cumbersome to the customer. The customer has to carry around the card when shopping, and the method and apparatus is not usable if the card is lost or stolen. U.S. Pat. No. 6,529,940 of Humble also disclosed a method and system for interactive in-store marketing, using interactive display terminals that allow customers to input feedback information to the distributed marketing messages.
U.S. Pat. Appl. Pub. No. 2003/0216958 of Register, et al. and its continuation-in-part U.S. Pat. Appl. Pub. No. 2004/0128198 of Register, et al. (hereinafter Register) disclosed a method and system for network-based in-store media broadcasting. Register disclosed each of the client player devices is independently supported by the communication with the internal audio/visual system installed in the business location, and he also disclosed a customizable broadcast is supported on each of the client player devices, specific to the particular business location. However, Register is foreign to the concept of automatically measuring the demographic information of the customers in the particular business location using the computer vision technology as the customization method of the contents for each client player device.
U.S. Pat. Appl. Pub. No. 2006/0036485 of Duni, et al. (hereinafter Duri) disclosed a method and system for presenting personalized information to consumers in a retail environment using the RFID technology. Duri very briefly mentioned the computer vision techniques as a method to locate each customer, but Duri is clearly foreign to the concept of utilizing an image classifier in the computer vision technologies to gather demographic information of the customers to customize the media contents in a media network.
U.S. Pat. No. 7,003,476 of Samra, et al. (hereinafter Samra) disclosed a system and method for targeted marketing using a ‘targeting engine’, which analyzes data input and generates data output. Samra used historical data to determine a target group based on a plurality of embedded models, where the models are defined as predicted customer profiles based on historic data, and the models are embedded in the ‘targeting engine’. In Samra, the ‘targeting engine’ maintains a customer database based on demographics, but Samra includes income, profession, marital status, or how long at a specific address as the demographic information, which cannot be automatically gathered by any computer vision algorithms over the visual information of the customers. Therefore, Samra is clearly foreign to the idea of measuring the demographic information automatically using computer vision technologies for matching the media contents to the demographics in a media network.
Media and Product Marketing Effectiveness
There have been earlier attempts to measure the media advertising effectiveness in a targeted environment, such as in a media network or in a retail store, and to understand the customers' shopping behavior by gathering various market research data.
U.S. Pat. No. 4,972,504 of Daniel, Jr., et al. (hereinafter Daniel, Jr.) and U.S. Pat. No. 5,315,093 of Stewart disclosed market research systems for sales data collection. U.S. Pat. No. 5,331,544 of Lu, et al. (hereinafter Lu) disclosed an automated system for collecting market research data. In Lu, a plurality of cooperating establishments are included in a market research test area. Each cooperating establishment is adapted for collecting and storing market research data. A computer system, remotely located from the plurality of cooperating establishments, stores market research data collected from the cooperating establishments. The collected market research data includes monitored retail sales transactions and captured video images of retail customers. The video images of customers are analyzed using a facial recognition system to verify whether the matches to a known gallery of frequent customers are established.
U.S. Pat. Appl. Pub. No. 2006/0041480 of Briggs disclosed a method for determining advertising effectiveness of cross-media campaigns. Briggs' method is to provide media suggestions on each media based on the advertising effectiveness analysis for the cross-media campaigns. Although Briggs disclosed strategic “six basic steps” to assess the advertising effectiveness for multiple media, he is clearly foreign to the concept of actually and automatically measuring the media effectiveness of an individual or a group of viewers based on the visual information from the viewers.
While the above mentioned prior arts tried to deliver matching media contents to the customers or while they tried to measure the media advertising effectiveness in a physical space, they are clearly foreign to the concept of utilizing the characterization information of the physical space, which is based on the automatic and actual measurement of the demographic composition of the people in the physical space. With regard to the media match, the prior arts used non-automatic demographic information collection methods from customers using cumbersome portable monitors, assessment steps, customer profiles, a customer's purchase history, or various other non-automatic devices and tools. In the prior arts, the attempts to measure the media effectiveness also relied on cumbersome requests for feedback from the customers or manual input, such as using questionnaires, registration forms, or electronic devices. Their attempts are clearly lacking the capability of matching the media contents to the characteristics of the physical space based on the automatic and actual demographic composition measurement in the physical space, using the computer vision technology for the demographics, such as gender, age, and ethnicity ratio, without requiring any cumbersome involvement from the customer.
The present invention is a method and system for characterizing a physical space based on automatic demographics measurement, using a plurality of means for capturing images and a plurality of computer vision technologies, such as face detection, person tracking, body parts detection, and demographic classification of the people, on the captured visual information of the people in the physical space, and the present invention is called demographic-based retail space characterization (DBR). It is an objective of the present invention to provide an efficient and robust solution that solves the aforementioned problems in the prior art.
Computer vision algorithms have been shown to be an effective means for detecting and tracking people. These algorithms also have been shown to be effective in analyzing the demographic information of people in the view of the means for capturing images. This allows for the possibility of connecting the visual information, especially the demographic composition of the people, from a scene in a physical space to the characterization of the physical space. The invention automatically and unobtrusively analyzes the customers' demographic information without involving any hassle of feeding the information manually by the customers or operator. Then the invention provides the automatic and actual demographic composition measurement to the decision maker of the physical space to help characterize the physical space as one of the key criteria for the characterization.
Body Detection and Tracking
There have been prior attempts for detecting and tracking human bodies in videos.
The article by I. Haritaoglu, et. al (hereinafter Haritaoglu) “W4: Real-Time Surveillance of People and Their Activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8. disclosed a method for detecting and tracking a human body in digital images. The system first learns and models background scenes statistically to detect foreground objects, even when the background is not completely stationary. It then distinguishes people from other objects using shape and periodic motion cues. The system tracks multiple people simultaneously by constructing an appearance model for each person during tracking. It also detects and tracks six main body parts (head, hands, feet, torso) of each person using a static shape model and second order motion tracking of dynamic appearance models. It also determines whether a person is carrying an object, and segments the object so it can be tracked during exchanges.
U.S. Pat. No. 6,421,463 of Poggio, et. al (hereinafter Poggio) disclosed a trainable object detection system and technique for detecting objects such as people in static or video images of cluttered scenes. The system and technique can be used to detect highly non-rigid objects with a high degree of variability in size, shape, color, and texture. The system learns from examples and does not rely on any a priori (hand-crafted) models or on motion. The technique utilizes a wavelet template that defines the shape of an object in terms of a subset of the wavelet coefficients of the image. It is invariant to changes in color and texture and can be used to robustly define a rich and complex class of objects such as people. The invariant properties and computational efficiency of the wavelet template make it an effective tool for object detection.
The article by K. Mikolajczyk, et. al (hereinafter Mikolajczyk) “Human detection based on a probabilistic assembly of robust part detectors,” European Conference on Computer Vision 2004, presents a novel method for human detection in single images which can detect full bodies as well as close-up views in the presence of clutter and occlusion. The system models a human body as flexible assemblies of parts, and robust part detection is the key to the approach. The parts are represented by co-occurrences of local features, which capture the spatial layout of the part's appearance. Feature selection and the part detectors are learned from training images using AdaBoost.
The disclosed system utilizes methods similar to the prior arts summerized above. As in Haritaoglu, the motion foreground is segmented to limit the search space of human bodies. A machine learning based approach is used to robustly detect and locate the human figure in images, as in Poggio and Mikolajczyk. However, the disclosed application assumes frontal human body pose; therefore, the method makes use of simpler body appearance model, where the shapes and the spatial arrangement of body parts are encoded using a graphical Bayesian method, such as Bayesian Network or Hidden Markov Model. Once the body image is located, the Bayesian body model adapts to the specific person's bodily appearance, and keeps the identity of the person for the tracking.
Non-Face Based Gender Classification
There have been prior attempts for classifying the gender of a person based on the bodily image signatures other than the face.
The article by K. Ueki, et. al (hereinafter Ueki), “A Method of Gender Classification by Integrating Facial, Hairstyle, and Clothing Images,” International Conference on Pattern Recognition 2004, presents a method of gender classification by integrating facial, hairstyle, and clothing images. The system first separates the input image into facial, hair and clothing regions, then independently computed PCAs and GMMs from thousands of sample images are applied to each region. The classification results are then integrated into a single score using some known priors based on the Bayes rule.
The disclosed invention utilizes a more general approach than Ueki for the gender classification, using bodily appearance signature. Instead of using the combination of upper body appearance signature (face, hairstyle, and necktie/décolleté) in grey scale for gender classification, the disclosed method utilizes the combination of more comprehensive bodily appearance signature (shape of the hair region, the body figure, and the color composition of the clothing). The bodily appearance signature is extracted using the Bayesian appearance model, according to the information provided by the body detection/tracking stage. The appearance signature is trained on thousands of images, each annotated with gender label. The trained classification machine serves as a stand-alone classifier when the customer's facial image is not available. The body-based classification can only apply to the gender classification.
Face Based Demographics Classification
There have been prior attempts for recognizing the demographic category of a person by processing the facial image using a machine learning approach.
U.S. Pat. No. 6,990,217 of Moghaddam, et al. (hereinafter Moghaddam) disclosed a method to employ Support Vector Machine to classify images of faces according to gender, by training the images including images of male and female faces; determining a plurality of support vectors from the training images for identifying a hyperplane for the gender decision; and reducing the resolution of the training images and the test image by sub-sampling before supplying the images to the Support Vector Machine.
U.S. Pat. Appl. Pub. No. 20030110038 of Sharma, et al. (hereinafter Sharma) disclosed a computer software system for multi-modal human gender classification, comprising: a first-mode classifier classifying first-mode data pertaining to male and female subjects according to gender and rendering a first-mode gender-decision for each male and female subject; a second-mode classifier classifying second-mode data pertaining to male and female subjects according to gender and rendering a second-mode gender-decision for each male and female subject; and a fusion classifier integrating the individual gender decisions obtained from said first-mode classifier and said second-mode classifier and outputting a joint gender decision for each of said male and female subjects.
The either prior arts (Moghaddam and Sharma) for demographics classification mentioned above aim to classify a certain class of demographics profile (only gender) based on the image signature of faces. These approaches deal with a much smaller scope of problems than the claimed method tries to solve; they both assume that the facial regions are identified and only address the problem of individual face classification. They don't address the problem of detecting and tracking the faces for determining the demographic identity of a person over the course of his/her facial exposure to the imaging device.
The proposed invention is a much more comprehensive solution where the automated system captures video frames, detects customers in the frames, tracks the people individually, corrects the pose of the faces, and finally classifies the demographics profiles of the customers—both of the gender and the ethnicity. The dedicated facial geometry correction step improves the face classification accuracy.
The present invention utilizes the motion foreground segmentation to locate the region where the customers entering a specified region can be detected. The method makes use of a frontal body appearance model, where the shapes and the spatial arrangement of body parts are encoded using a graphical Bayesian method. Once the body image is located, the Bayesian body model adapts to the specific person's bodily appearance, and keeps the identity of the person for the tracking. The estimated footfall location of the person determines whether the person has entered the monitored area. If the frontal facial image is available, then the learning machine based face classifier is utilized to determine the demographics group of the person. If the frontal facial image is not available, then the demographics classifier utilizes the holistic bodily appearance signature as a mean to distinguish between male and female.