Many search engine services, such as Google and Yahoo, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request (i.e., a query) that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling” the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service identifies web pages that may be related to the search request based on how well the keywords of a web page match the words of the query. The search engine service then displays to the user links to the identified web pages in an order that is based on a ranking that may be determined by their relevance to the query, popularity, importance, and/or some other measure.
Search engine services obtain revenue by placing advertisements along with search results. These paid-for advertisements are commonly referred to as “sponsored links,” “sponsored matches,” or “paid-for search results.” An advertiser who wants to place an advertisement (e.g., a link to their web page) along with certain search results provides a search engine service with an advertisement and one or more bid terms. When a search request is received, the search engine service identifies the advertisements whose bid terms match the terms of the search request. The search engine service then selects advertisements to display based on the closeness of their match, the amount of money that the advertisers are willing to pay for placing the advertisement, and other factors. The search engine service then adds a sponsored link to the search result that points to a web page of the advertiser. The search engine services typically either charge for placement of each advertisement along with search results (i.e., cost per impression) or charge only when a user actually selects a link associated with an advertisement (i.e., cost per click).
Advertisers would like to maximize the effectiveness of their advertising dollars used to pay for advertisements. Thus, advertisers try to identify bid term, advertisement, and bid amount combinations that result in the highest benefits (e.g., most profit) to the advertiser. Advertisers may analyze query trends to identify bid terms, timing for placing advertisements based on those bid terms, bid amounts for those bid terms, and so on. Query trend analysis studies how the frequency of queries changes over time so that future frequency of queries can be predicted. If query trends can be accurately predicted, then advertisers can adjust their placement of advertisements in an attempt to maximize the advertising effectiveness. For example, if the frequency of a query is likely to increase in the near future, an advertiser may want to increase the bid amount for terms of that query. It has been, however, difficult to accurately model the frequency of queries and thus difficult to accurately predict the frequency of queries.
Because of the popularity of search engine services, the query logs generated by search engine services tend to be very large. A query log may include millions of entries, each of which identifies a query that was submitted by a searcher and a time of submission. Because of their size, the query logs consume vast amounts of storage. To reduce their storage requirements, the query logs are often compressed into query frequency information by storing, for each query and for each interval (e.g., a day), the frequency (i.e., count of submissions) of that query for that interval, rather an entry for each individual query submission. Nevertheless, because millions of different queries can be submitted by searchers, even the query frequency information consumes large amounts of storage.