1. Field
The present invention relates generally to computer systems and, more specifically, to techniques for inferring consumer affinities based on shopping behavior with unsupervised machine learning models.
2. Description of the Related Art
Geolocation analytics platforms are generally used to understand human behavior. Such systems map data about places to geographic locations and then this mapping is used to analyze patterns in human behavior based on people's presence in those geographic locations. For example, researchers may use such systems to understand patterns in health, educational, crime, or political outcomes in geographic areas. And some companies use such systems to understand the nature of their physical locations, analyzing, for instance, the demographics of customers who visit their stores, restaurants, or other facilities. Some companies use such systems to measure and understand the results of TV advertising campaigns, detecting changes in the types of customers who visit stores following a campaign. Some companies use geolocation analytics platforms to target content to geolocations, e.g., selecting content like business listings, advertisements, billboards, mailings, restaurant reviews, and the like, based on human behavior associated with locations to which the content is directed. In many contexts, location can be a useful indicator of human behavior.
In some cases, people are classified as being members of various audiences, or relatively homogenous populations in terms of expected behavior (e.g., propensity to attain an educational outcome, respond favorably to content, visit a store, etc.). One criteria for identifying audiences is a person's current geographic location. Often the location of people is indicative of various likely behaviors. The designation of audiences can be helpful in a variety of contexts. In the political sphere, swing voters constitute a type of audience. Or in the realm of education services, at-risk students can constitute another type of audience. Government services and commercial real-estate site selection may also be influenced by behaviors of audiences, e.g., a decision to position a restaurant franchise near a place members of an audience frequent during lunch hours.
Many existing systems define audiences with insufficient specificity. In some cases, a user's current location is relatively poorly correlated with behavior. Tourists in New York would likely not be interested in reviews of local plumbers, for example. Yet discovering more precise descriptors of audiences can be difficult. Often data about users is relatively high-dimensioned, including, for instance, location history, purchasing behavior, social network behavior, and the like. Selecting and properly balancing among values in these various dimensions can be difficult, so those evaluating such data often disregard meaningful information in records about people.
Similar issues arise when segmenting audiences and larger populations according to their behavior as consumers. Often, the relevant segments are not know ex ante, so labeled training sets are often unavailable to construct and refine predictive models. Further, adequately processing data indicative of consumer at commercially relevant scales is often beyond the capabilities of many traditional analytical systems. In many cases, the diversity of consumer behavior often warrants relatively long-tailed sets of segments, and these segments are, in many cases, only revealed when processing relatively large data sets, e.g., describing behavior of millions of consumers.