The present disclosure relates to techniques for determining keywords associated with a document.
As the number of web pages on the Internet continues to increase, search engines are becoming increasingly important in directing customer traffic to specific web pages. For example, in response to a customer query, a search engine may generate a search expression that includes synonyms and paraphrases of the customer query, as well as logical permutations of the positions of the words in the search query. This search expression is compared against information in a set of crawled web pages from the Internet to identify the web pages that are most relevant to the search query to produce a set of search results. In particular, the search results are usually the top web pages in a ranking of match scores that are associated with the set of web pages. These match scores measure the agreement between the search expression and information in the set of web pages.
A variety of factors are typically considered when generating a match score for a given web page. Among these are keywords associated with the web page (which are sometimes referred to as ‘adwords’). As a consequence, the choice of the keywords associated with the given web page can have a significant impact on the ranking of the given web page in the search results and, thus, on the amount of customer traffic that is driven to the given web page.
However, choosing suitable keywords for a web page can be difficult. For example, while many search-engine providers offer services to assist an owner of the given web page in choosing suitable keywords, these tools often return a large number of possible keywords (such as more than 100,000 keywords). Reviewing such a large list of keywords can be time consuming. This problem is compounded by the uncertainty about which keywords are likely to provide the best results (in terms of web-page traffic) at a reasonable price. In particular, the cost of purchasing the right to use popular keywords can be prohibitive for most web-page owners.