An advertiser, such as Ford™ or McDonald's™, generally contracts a creative agency for ads to be placed in various media for the advertiser's products. Such media may include TV, radio, Internet ads (e.g., sponsored search ads, banner display ads, textual ads, streaming ads, mobile phone ads, etc.) or print media ads (e.g., ads in newspapers, magazines, posters, etc.). It is quite possible that the advertiser may engage one or more creative agencies that specialize in creating ads for one or more of the above media. A company wants to show the most relevant ads to end users in order to get the most value from their ad campaign.
A company like Yahoo!™ gathers enormous amounts of user data related to IP (Internet Protocol) addresses of end user computers. For example, the company may gather valuable search data when end users perform search queries. The search data may include search terms, searches performed and other data. The company may get information about a user by collecting user profile information that the user inputs into a web site. The company may also infer demographic information (e.g., location, age, gender, etc.) from analyzing the pages an end user visits, even if the end user never does a search. The company may also gather event data, including data related to end user behavior on the Internet. Such behavior may include, for example, clicks on ads. The company sees IP addresses from which the company can usually infer zip codes and even street-level data. The company sees login information and sees the pages that end users visit. All of this data is highly valuable to any company that advertises because the data may help the company advertise in the most effective way.
The search advertising marketplace generates billions of dollars in revenue each year for search engine companies like Yahoo!™. The search marketing marketplace works on a cost-per-click (CPC) model. When an end user performs a search query online and clicks on a sponsored search text ad, a company like Yahoo!™ is paid by the respective advertiser. End users tend to click on more relevant ads.
An advertiser that utilizes data from a search engine wants to show the most relevant ads to end users in order to get more clicks on the advertiser's ads. In order to do this, the advertiser needs to gather end users' events, such as user profiles, search terms, searches performed, search behavior, click behavior and other browsing behavior. The advertiser may then use this information to target relevant ads to different end users.
In the CPC model, there are two important events that go through a data pipeline—search events and click events. Search events occur when an end user performs a search query. Click events occur when an end user clicks on a sponsored text ad. Web servers of a company like Yahoo!™ collect search events when an end user performs a query on the company's search page. URLs of the ads on the search result web page may contain the click event information. An advertiser may want to collect and analyze the search and click events in order to build a model for query-to-text ad relevance. If the advertiser can learn which ads are more relevant, then the advertiser can target these ads to end users and get a higher click-through rate (CTR).
The amount of data gather by a search engine company, such as Yahoo!™, is tremendous. The amount of data is typically in the order of petabytes per day. There is so much data that, unfortunately, not all the data is utilized in an efficient manner.