The World Wide Web (“WWW”) is a distributed database including content items that are accessible through one or more interconnected networks, e.g., the Internet. Searching and indexing these pages to produce useful results in response to user queries is an ongoing challenge for providers of search services. A search engine is typically used to search content items available via the WWW. A search engine collects content items (or references to content items, also referred to as links) from the Internet or other sources through the use of a crawler, which aggregates content items for indexing and searching. Although many algorithms exist for use in collecting content items, many crawlers follow links in known hypertext documents or other content items to obtain additional, subsequent content items. The crawler stores content items that it retrieves in a database and an indexing component builds a searchable index of the content items in database. Typical indexing methods include inverted files, vector spaces, suffix structures, and hybrids thereof, as well as other similar indexing techniques known to those of skill in the art. For example, a given web page may be broken down into words and respective locations of each word on the page. The pages are then indexed by the words and their respective locations. A primary index of the contents of the database is then decomposed into a plurality of sub-indices, with each sub-index sent to a search node in a search node cluster.
A search engine may receive one or more query terms that the search engines uses to search the index, generating sorted search results along with an identifier and a relevance score for a given result. The relevance score is a function of the query itself and the content item. Factors that are used for relevance include: a static relevance score for the document such as link cardinality and page quality, superior parts of the document such as titles, metadata and document headers, authority of the document such as external references and the “level” of the references, as well as document statistics such as query term frequency in the document, global term frequency, and term distances within the document.
Terms comprising a given search query and, more specifically, topics, themes, concepts, or categories to which the terms relate, may indicate trends in consumer interests and needs of users executing searches. Such trends and their future directions, as indicated by search terms, are of interest to content providers as they may have an impact on business economics, affecting advertising strategies and costs, as well as the success and quantity of sales and other business activities. Trends in the informational needs of users conducting searches may be difficult to predict, leaving businesses with little direction when predicting advertising costs, sales, etc. New techniques are needed to reduce the hardship caused by such unpredictability, or, where it is believed that a good trend prediction is available, to capitalize on the prediction. Businesses have an interest in predicting the needs of information consumers, as indicated by search terms, whether for economic or advertising reasons. Content providers also have an interest in attempting to predict such trends as what the next “hot” entertainer will be, what singer's current popularity will soon be on the downslide, what the next consumer electronics rage will be, etc., as well as providing additional content items regarding the same such that sufficient and appropriate content is available to information consumers.
Currently, systems and methods do not exist by which content providers may determine with accuracy the content needs for specific concepts and categories that a given user (or group of users) needs or desires. Similarly, there are presently no systems and methods that enable users to communicate specific content needs to information providers. Thus, there is a need in the art for systems and methods operative to determine a relationship between available content and current interests to identify an need for content.