1. Field of the Invention
The present invention is a method and system for selectively executing content on a device based on automatic recognition of predefined characteristics, including visually perceptible attributes, such as a demographic profile of people, local behavior analysis, and emotional status, identified automatically using a sequence of image frames from a video stream, wherein a targeted media is selected from a set of media pools according to the automatically-extracted, visually perceptible attributes and feedback from the people, where the media comprises various kinds of stimulus, including visual stimulus, audio stimulus, and music, from various sources.
2. Background of the Invention
Companies are constantly looking for new ways to better advertise their product or service. Traditionally, advertising is commissioned through a broadcast model. Broadcasting with television, radio, billboard, magazine or newspaper ads to a broad population of people is made in hopes that some of the people in that population will actually look at or listen to the advertisement and be influenced by it. This broadcast model for advertising is, by definition, based on very large segments of the population.
The effectiveness of the advertisement can be very difficult to measure using a broadcast model. Also, it tends to be costly to implement because, in order to influence enough people with the advertisement, it has to be shown very widely. With effective ad targeting, unnecessary side effects of broadcasting can be minimized. Recent advances in processor speed and digital displays have made it possible for retail establishments to afford and utilize electronic digital billboards or displays. Although this allows for a much richer media to be presented, contemporary systems still suffer from the limitations of the broadcast model of advertising.
There have been attempts to distribute targeted advertising content in the prior art. For example, U.S. Pat. No. 6,119,098 of Guyot, et al. (hereinafter Guyot) disclosed a method and apparatus for targeting and distributing advertisements over a distributed network. However, in Guyot, advertisements are targeted to the subscriber based on a personal profile provided by the subscriber.
U.S. Pat. No. 6,182,050 of Ballard (hereinafter Ballard) disclosed a method and apparatus for distributing advertisements online using target criteria screening, which also provided a method for maintaining end user privacy. In the disclosure, the demographic information or a desired affinity ranking was gathered by the end user, who completed a demographic questionnaire and ranked various categories of products and services.
U.S. Pat. No. 6,298,330 of Gardenswartz, et al. (hereinafter Gardenswartz) disclosed a method and apparatus for communicating with a computer based on the offline purchase history of a particular consumer. The invention included the delivery of a promotional incentive for a consumer to comply with a particular behavioral pattern. In the disclosure, the targeted advertisements were changed based on changes in consumers' purchase history behaviors. In the disclosure, the consumer supplied the registration server with information about the consumer, including demographics of the consumer, to generate an online profile.
U.S. Pat. No. 6,385,592 of Angles, et al. (hereinafter Angles) disclosed a method and apparatus for delivering customized advertisements within interactive communication systems. In Angles, the advertising provider computer generated a customized advertisement based on the consumer's profile, upon receiving the advertising request. In Angles, the consumer, who wished to receive customized advertisement, first registered with the advertisement provider by entering the demographic information into the advertisement provider's demographic database.
U.S. Pat. No. 6,408,278 of Carney, et al. (hereinafter Carney) disclosed a method and apparatus for delivering programming content on a network of electronic out-of-home display devices. In Carney, the network includes a plurality of display devices located in public places, and the delivered programming content is changed according to the demographics of the people. Carney also suggests demographic data gathering devices, such as a kiosk and an automatic teller machine.
While it has interesting and versatile approaches to the programming content delivery network, there are some limitations in the data gathering devices of the prior art. One of the limitations is that the data gathering device is collocated adjacent to the display device in the prior art. However, it does not have to be that way, and it sometimes should not be. Depending on the public place environment and the business goal where the embodiment of the system is installed, there can be necessities to install the data gathering devices regardless of the position of the display device. For example, some owners of public places could want to utilize the widely-used and already installed surveillance cameras in their public places for the data gathering. The surveillance cameras are not necessarily collocated adjacent to the display devices, and usually they are not. In another example, some owners of public places could want to use wide-angle cameras or multiple cameras to gather overall demographic information and statistical data from a small group of people in a certain region of the public place during a certain time. In the above examples, the targeted content can be delivered and displayed through display devices, which do not need to be collocated adjacent to the data gathering devices, cameras. The prior art invention also briefly mentioned the use of a camera as one of the methods for gathering demographic information. However, it did not disclose sufficiently enough about the approach of how to process images to determine the demographic composition of the audience.
U.S. Pat. No. 6,484,148 of Boyd disclosed electronic advertising devices and methods for providing targeted advertisements based on the consumer profiles. The disclosure included a receiver for receiving identifying signals from individuals, such as signals emitted by cellular telephones, and the identifying signal was used for the targeted advertisements to be delivered to the individuals.
I. Haritaoglu, and M. Flickner, in “Attentive Billboards”, 11th International Conference on Image Analysis and Processing, Sep. 26-28, 2001, Palermo, Italy, disclosed a real-time vision system, which detected, tracked, and counted the number of people standing in front of billboards. Haritaoglu briefly mentioned their attempt for an automatic demographics measurement. However, Haritaoglu does not explicitly teach how to measure the demographics automatically at all, other than the mere indications of their attempt, whereas the present invention explicitly discloses a specific method of how to measure the demographics automatically, particularly using Support Vector Machine (SVM)-based age and gender classifiers, to determine the age and gender of the person in the images in an exemplary embodiment. The present invention further discloses key steps in the automatic demographics measurement.
Although, in the prior art, there have been attempts to customize advertising content using demographic information, the bottleneck was to gather the demographic information efficiently and automatically in real-time. This was especially true for the audiences in a public place. The conventional approaches for gathering the demographic information, which require the audiences' feedback, such as using questionnaires, registration forms, or electronic devices, are often found to be cumbersome to the audience. Furthermore, in the case of using questionnaires or an offline purchase history, the gathered information is not real-time data, so it does not reflect the current demographic information of the current audience in a specific public place during a specific time. Since they are not real-time data, but previously saved data, the effect of targeted advertisement diminishes. Thus these problems in the prior art require an automatic and real-time approach for gathering the demographic information from the audience in a public place.
In a preferred embodiment, the present invention is a system for selectively executing targeted media on a display device based on the automatic recognition of the predefined visually perceptible attributes associated with people in the view of the camera or set of cameras, providing an efficient and robust solution, which solves the aforementioned problems in the prior art. Computer vision algorithms have been shown to be an effective means for detecting people. These algorithms also have been shown to be effective at extracting relevant demographic features of the people in the view of the sensor or set of sensors. This allows for the possibility of connecting the visual information from a scene to the behavior and content of a digital media. The invention allows freedom of installation position between data gathering devices, set of cameras, and display devices. The invention also automatically gathers the demographic information from the people without involving any hassle of feeding information provided manually by the people. The invention also processes the demographic information gathering in an efficient and robust way, and enables real-time customization of the targeted content.
U.S. Pat. Appl. Pub. No. 20020016740 of Ogasawara (hereinafter Ogasawara) disclosed an electronic shopping system that provides customer recognition with wireless identification. Ogasawara also sends visual data to point-of-sale (POS) terminals. However, Ogasawara does not show the novel and unobvious features in the present invention. Automatic image processing to obtain demographic information and visually perceptible attributes from images of people are foreign to Ogasawara. Although Ogasawara disclosed a method for obtaining visual image data of particular customers at the time each customer enters the establishment, Ogasawara is clearly foreign to the idea of automatic image processing to obtain demographic information or the visually perceptible attributes. In Ogasawara, such demographic information is not obtained from the captured images, but rather from previously recorded demographics data accessed using the customer's ID. In Ogasawara the demographics data is already stored in a loyalty database, and image data for each customer is made available for the establishment's personnel to recognize and greet each customer on a personal basis, rather than an automatic image processing to obtain demographic information.
The definition for the demographic information in Ogasawara includes features that cannot be calculated using a computer vision algorithm. The approaches as to how the demographic information and visually perceptible attributes are gathered and applied in the embodiments are significantly different between the prior arts and the present invention. In the present invention, the demographic information is a part of the visually perceptible attributes, and it refers to the visually perceptible demographic information, such as gender or age, by computer vision-based demographic classification technology. Ogasawara's definition for the demographic profile information includes a customer's name, address, telephone number, date of birth, family status, and number of children, which are stored in the information storage area of the customer ID card. These features cannot be automatically measured by the computer vision-based demographic classification technology. Therefore, not only is Ogasawara foreign to the method of automatic demographic classification based on the automatically-detected facial images of people using a computer vision-based demographic classification technology, without requesting any input from the people, but also the focus of the demographic information in Ogasawara is clearly different from the focus of the demographic information in the present invention. The present invention is clearly focused on the demographic information that can be visually measured.
In line with the goal for the automatic method of obtaining the demographic information in the present invention, the present invention does not require any involvement from the customers to obtain the demographic information in the preferred embodiment. The present invention considers the requirement for the cumbersome involvement as a problem to solve in the prior arts. Ogasawara requires a cumbersome involvement from the customers by requesting customers to carry the card with them. This is one of the limitations in prior arts that the present invention tries to overcome. Ogasawara disclosed a method that seeks to customize the sales presentation/advertisements based upon customer profile/demographics. In Ogasawara, this customization is done manually by the store employee and is not automatically based on the demographic data.
Furthermore, Ogasawara does not disclose a method for customizing advertising according to said demographic information and for displaying said targeted media on said means of displaying the content. Ogasawara disclosed a means of displaying the visual image to the store employee, but not a targeted media based on captured demographic information.
U.S. Pat. No. 5,155,591 of Wachob (hereinafter Wachob) disclosed a system that provides different commercial messages to different demographically-targeted audiences for cable television or satellite broadcasting system. Wachob noted the use of a user demographic key on a handheld remote control and household survey, diary information, known address, neighborhood locations, or known ethnic locations as ways to determine demographic types. Wachob is clearly foreign to the idea of automatic image processing to obtain demographic information or the “visually perceptible attributes,” particularly for the customers in a retail space. The definition for the demographic information in Wachob includes features that cannot be calculated using a computer vision algorithm. Wachob disclosed household survey, diary information, address, neighborhood locations, or ethnic locations as methods of determining individual viewer demographic types, which shows that the definition for the demographic information in Wachob is also different from the present invention.
Wachob does not select the content based upon automatically-captured demographic data. It selects the content based upon demographic data that is entered by a viewer using a device such as remote control or household survey. Wachob noted that the demographic data is entered by a viewer using a remote control, and then the demographic data is used for targeting the content for the television. As discussed, there is no disclosure about automatically capturing the demographic data from visual images and using the demographic information to target content.
Furthermore, the demographic information in the present invention is primarily concerned with the customer in a retail store, whereas the demographic information in Wachob is primarily concerned with the television-viewing consumers, so the approaches as to how the demographic information is gathered and applied in the embodiments are significantly different between Wachob and the present invention. The computer vision algorithms in the present invention deal with the obstacles in the retail environment, which may or may not be applicable for the customers in a television-viewing environment.
In the above prior arts, the approaches as to how the demographic information and visually perceptible attributes are gathered and applied in the embodiments are significantly different between the prior arts and the present invention. This significant difference clearly shows the novelty and unobviousness of the present invention over the prior arts. Consequently, the methods in prior arts will not work when the customers do not cooperate with the requirement of their involvement for the systems. For example, if customers do not carry the customer ID card or tag, or if they forget to bring it to the establishment, Ogasawara will not work. If the input device, such as a remote control, is broken or lost, the viewer will not be able to enter the demographic data in Wachob. Therefore, it is an objective of the present invention to overcome cumbersome involvement from the customers.
U.S. Pat. No. 6,269,173 of Hsien (hereinafter Hsien) disclosed an instant response broadcast board system that operates based on the detection of the movement of objects in front of the board. Hsien is entirely foreign to the idea of obtaining demographic information of customers. Furthermore, Hsien explicitly noted that the interactivity with customers is one of the key ideas in Hsien. Whereas, the present invention explicitly noted the need of an automatic approach for gathering the demographic information, and explicitly disclosed that the present invention teaches a step for automatically obtaining demographic information of people from face images, whereby interaction by the people for obtaining said demographic information is not needed.
U.S. Pat. No. 6,904,168 of Steinberg, et al. (hereinafter Steinberg) disclosed an image analysis engine with multiple sub-engines, each dedicated to different attributes of an image. Steinberg does not teach a step of selecting digital media to be displayed on a digital device based on the automatically-extracted, visually perceptible attributes. Furthermore, Steinberg is entirely foreign to the automatically-extracted, visually perceptible attributes that comprise gender, age range, number of people, gaze characteristics, height, hair color, skin color, clothing, and time spent in front of the means for playing the content. In Steinberg, none of the “shape analysis engine,” “skin tone analysis engine,” “texture analysis engine,” “textual analysis engine,” and “curvature analysis sub-engine,” teaches the “automatically-extracted, visually perceptible attributes,” that comprise gender and age range in the present invention, and they are entirely different ideas.
U.S. Pat. No. 5,636,346 of Saxe (hereinafter Saxe) disclosed a system for delivering targeted advertisements and programming to demographically-targeted television audiences. Saxe did not explicitly teach how to automatically-extract, visually perceptible attributes of each person, wherein the visually perceptible attributes comprise demographic information, including gender, age, height, skin color, and hair color, using captured images of the people in front of a display.
U.S. Pat. No. 7,383,203 of Feldstein, et al. (hereinafter Feldstein) disclosed a system and method for dynamically providing a user with personalized, data based on user input, and tracking the user input for providing data that is automatically updated in real time with network tracking techniques. Feldstein explicitly noted demographics based on the user submitted information, and Feldstein is entirely foreign to the idea of automatically extracting visually perceptible attributes of each person, wherein the visually perceptible attributes comprise gender, age, height, skin color, and hair color, using at least a Support Vector Machine (SVM) that is trained for age and gender for selectively executing targeted media on a display, whereby interaction by the individual or the people for obtaining the demographic information is not needed, as disclosed in the present invention.
“Place-based media” (PBM) are the media vehicles that provide local information and advertisements to the audience present at a particular location. Digital signage is a typical example of such media. Unlike media such as cinema, television, or the Internet, place-based media does not have a captive audience. In many cases, the primary motivation for the audience present in the location is not media consumption, but something else, such as shopping, pumping gas, working out, etc. PBM relies on interrupting the audience with exciting information, therefore offering relevant and highly-targeted content is extremely important.
The current invention analyzes and segments the available audience in real-time, based on various visible parameters, and dynamically provides relevant content. The system can also predict the audience composition at a given time, based on historical trends, and plays content based on that information.
The system segments audience members based on the information from their facial features (such as age, gender, and ethnicity), emotional state (such as happy, sad, and tired), and behavior (such as dwell time, impression duration, direction of travel, party composition, etc.). The system then selects and plays the most relevant content for the audience from a bank of contents.
The system can also capture their feedback and analyze it to measure the affinity of various audience segments to a particular content, to intelligently match audience segments and content. Feedback captured by the system includes changes in their behavior and emotional state. It can also integrate other forms of feedback metrics, such as sales, website visits, short message service (SMS) texting, etc.
The current video analysis technology is capable of following the movement of people in a retail space and recognizing their behaviors, emotional states, and some of the personal profiles based on their visual features. More specifically, visual tracking technology can find any person in a camera view, and track the person across multiple camera views. If such system is deployed in a retail space, it can monitor how the customers move around the retail space to find and purchase products. The personal profiles of the customers—such as gender, age, and ethnicity—can also be recognized by analyzing their facial images. Further, facial image analysis may also monitor the changes in their emotional states, based on changes in their facial expressions or behaviors. As the needs and interests of the customers vary with their demographic background, customizing the media content toward these measurable customer profiles will improve the effectiveness of the media. The changing needs and interests of a customer can also be estimated based on her/his shopping history in that particular shopping trip and also on changes in emotional states.
The present invention combines the aforementioned technologies to measure the individual needs of the customers to customize the media content played to them in real time. First, the visual tracking technology can track a potential consumer of the media to record the person's shopping history—i.e., for which product the person has been shopping or which services the person has been using. The demographic class or the emotional state of the customer is also measured, based on facial image analysis, before the person is exposed to the media. The media control of the system then prepares and plays a customized media content. The media control may employ a fixed set of rules to find an appropriate media content among a pool of available media contents. The same kind of technology that analyzes facial images to estimate the emotional state of the customer can be used to measure the changes in emotional state and attention while the media is being played. This information—media response—may be fed back to the media control module to improve the media customization scheme, so that the effectiveness of the media selection can be improved for the next cycle.
On the other hand, there is a crucial issue of how to change the media customization scheme based on the feedback from the media response. One simple way would be to test many possible mappings between the media content and the audience profiles, and choosing the mapping that receives the overall best responses. This scheme assumes that there is a fixed set of media content, and that the market environment, such as consumer tastes or trends, is static. However, both the available media contents and the trends are constantly changing. The present invention models the media customization as an interaction between an agent, i.e., a media control module, that controls media content and the environment of the media audience; the media control (agent) selects and displays media content to the audience (action), and the audience feeds back the response (reward) to the agent. Then, the agent makes an appropriate adjustment to the media selection rules, so that the later response feedback would be improved. The present invention employs reinforcement learning as a specific means to solve the problem. The reinforcement learning scheme iteratively explores the space of possible media selection rules, and finds a solution that yields the optimal long-term responses from the audience.