The World Wide Web may be though of as a distributed data store comprising billions of data content items through the Internet. Search engines are commonly used to search the content items available on computer networks, such as the World Wide Web, to enable users to locate content items of interest. A typical search engine is capable of accessing the plurality of web pages, hypertext documents, and other content items from the Internet or other network through use of a crawler. A crawler identifies content items available from a number of sources, using various methods and algorithms. For example, a crawler may follow hyperlinks in a corpus of hypertext documents to identify other content items. The content items that the crawler identifies, or references thereto, may be stored in a database or a similar data store. Thereafter, these content items are indexed by an indexer, which may be operative to build a searchable index of the content items n the database. Existing methods for indexing may include inverted files, vector spaces, suffix structures, and hybrids thereof. For example, each web page may be broken down into words and respective locations of each word on the page. The pages are then indexed by the words and their respective locations. A primary index of the whole database may then be broken down into a plurality of sub-indices, with a given sub-index being sent to a search node in a search node cluster.
To use search engine, a user typically enters one or more search terms or keywords, which are sent to a dispatcher. A dispatcher may compile a list of search nodes in a cluster to execute the query and may forward the query to those selected search nodes. The search nodes in a search node cluster may search respective parts of the primary index produced by the abovementioned indexer and return sorted search results, along with an identifier and a score, to the dispatcher. The dispatcher may then merge the received results to produce a final result set for display to the user, sorted by relevance scores.
The relevance score may be a function of the query itself and the type of document produced. Factors that affect the relevance score may include: a static relevance score for the document such as link cardinality and page quality, placement of the search terms in the document, such as titles, metadata, and document web address, document rank, such as a number of external data records referring to the document and the “level” of the data records, and document statistics such as query term frequency in the document, global term frequency, and term distances within the document. For example, Term Frequency Inverse Document Frequency (“TF/IDF”) is a statistical technique that is suitable for evaluating how important a word is to a document. According to TF/IDF, the importance of a given word increases proportionally to the number of times the given word appears in the document, but is offset by how common the word is across documents in the collection.
In addition to search results identified as response to the one or more terms received from a given user, a user may also be presented with one or more advertisements. For example, an advertiser may agree to pay an amount of money to a search engine operator, commonly referred to as the bid amount, in exchange for a particular position in a set of search results that is generated in response to a user's input of a particular search term. A higher bid amount may result in a more prominent placement of the advertiser's website in a set of sponsored search results. Advertisers may adjust their bids or bid amounts to control the position at which their search listings are presented in the sponsored search results. The charging system places search listings having higher-value bids higher or closer to the top of the search listings. More prominent listings are seen by more users and are more likely to be clicked through, producing traffic of potential customers to the advertiser's web site.
Search engine operators have developed various tools suitable for use in pay-for-placement systems to help advertisers manage their bids and attract traffic. For example, a bidding tool may be used to select keywords upon which advertisers may bid to have their advertisements or websites displayed in response to a search comprising one or more terms associated with the selected keywords. Thus, when a user performs a search on a pay-for-placement search engine, or when selecting one or more advertisements for display, the sponsored results are conventionally sorted and displayed on the basis of an amount that a given advertiser has bid on a given search term. Because different users use different keywords to find the same information, it is important for an advertiser to bid on a wide variety of search terms to maximize the traffic to the advertiser's website.
Advertisers may attempt to place high bids on more than one search term to increase the likelihood that their websites will be seen as a result of a search for those terms. The better and more extensive an advertiser's list of search terms, the more traffic the advertiser will see. There are many similar search terms, however, for which the advertiser many not have bid. As a result, the advertiser can miss opportunities for advertising placement when these similar search terms are used, and the search engine operator may not receive any revenue from searches performed using such search terms for which there have been no bids.
Even in the context of non-sponsored searches, or search results that do not involve pay-for-placement listings, a search engine user is at a disadvantage in the absence of intelligent searching of search terms that are similar to those that the user provides to the search engine. This produces limited results that do not necessary reflect user intent in conducting a search. In some systems, there is some spell-checking that is performed on search terms that a user provides to the search engine.
Thus, there is a need for systems and methods that provide searches or suggested searches of search terms that are similar or related in meaning to the search terms that a user provides to a search engine. There is also a need for a system and method for searching unbidded search terms in a sponsored search systems that are similar or related or related in meaning to those that a user provides.