Advertising is a critical economic driver of the internet ecosystem, with internet advertising revenues estimated to be around US $5.9 billion in the first quarter of 2010 alone. This online revenue stream supports the explosive growth of the number of web sites and helps offset the associated infrastructure costs. There are two main types of advertising depending on the nature of the ad creative: textual advertising in which the ads contain text snippets similar to the content of a web-page, and display advertising in which the ads are graphical ad creatives in various formats and sizes (static images, interactive ads that change shape and size depending on the user interaction powered by flash etc.). The text ads are typically displayed either in response to a search query on the search results page, while the display ads are shown on other content pages. Advertisers book display advertising campaigns by specifying the attributes of the site where their ads should be displayed, and/or the attributes of users to whom the ads can be shown. For example, one display advertising campaign can specify that the ads should be shown only on pages related to Sports, and to users who visit those pages from say, the state of California, USA. In addition, the advertiser (or an advertising agency that works on behalf of the advertiser) also specifies the ad creative (the physical ad image) that should be displayed on the user's browser, and the time period over which the ad should run.
Ad serving systems select the ads to show based on the relevance of the ad to either the content of the page, or user, or both. This serving typically involves 2 steps: (i) a matching step which first selects a list of ads that are eligible to be displayed in an ad-serving opportunity depending on the requirements from the advertiser, the attributes of the page, the user, etc., and (ii) a ranking step which then rank orders the list eligible ads based on some objective function (relevance, expected revenue, etc.). The algorithms in these matching and ranking steps leverage data about the available ads, the content of the pages on which the ads are to be shown, the interest of the user etc. Typical display ad campaigns do not require the advertiser to give much more information about the ads themselves, other than that they meet certain quality requirements including for example, the image should not contain any offensive content, should render correctly on the browser.
One common information used in these matching and ranking steps is the category of these component entities (pages, queries, ads), from among a set of relevant user interest categories (e.g., Travel, Finance, Sports). These categories are either assigned manually by editors, or using machine learned categorization tools trained using some historically labeled set of entities. It is typically easier to train machine learned categorization tools to categorize content pages, queries, and text ads, using standard feature construction techniques used in information-retrieval, for example, a bag of words, term-frequency-inverse-document frequency (tf-idf) feature weights etc. Display ads on the other hand do not lend themselves to easy feature representations. Categorization of display ads typically involves large-scale manual labeling by a large team of human editorial experts.