Information spaces, such as the Internet, enterprise networks, document repositories, and information storage and retrieval services allow widespread access to large collections of information. For example, users commonly use search engines to locate and select desired information on the Internet. Many entities, such as businesses, individuals, government organizations, etc., now use the Internet to publish information as well as to advertise goods and services. Publishers have an interest in ensuring that their content can be easily located. Also, users performing searches have an interest in locating items that are most relevant to their search.
Search engines assist users in locating items in an information space. Such items can include documents, web pages, images, videos, and many other kinds of information known in the art. The search engines typically use search algorithms that employ either literal keyword matching techniques or approximate matching of the words or symbols specified in a user's query or search request. Thus, in conventional search engines, a user searching for information must provide keywords that will hopefully match desired content. At the same time, entities who wish to provide content must attempt to anticipate how their information will be searched and then tag their content in the hope that their tags, as well as the actual text of their content, will match user-provided keywords in order to provide the most appropriate content in response to user search requests. In practice, however, this methodology is less than ideal for both content users and content providers.
A variety of keywords can map to conceptual ideas in multiple and non-unique ways, which can make tagging and keyword searching difficult. In addition, a given combination of keywords may not be the same between two users seeking similar content. Accordingly, concept matching or semantic matching within search engines can be poor. Conventional search engines can also be ineffective at ascertaining meaning that is inherent in content items. Indeed, because, for many documents, content is expressed in natural language with no convention or structure governing the meaning of the content, search engines are, in general, unable to locate the most appropriate content reliably. It is not currently feasible to rely on search engines to derive semantic meaning or significance from online content by using automated algorithms alone. For example, a user researching accidents with significant media coverage in 2014 might query a conventional search engine with the phrase “spectacular accidents 2014.” One of the first results for such a search would likely be an entirely irrelevant article entitled, “Flavie Audi: Spectacular Accidents—The young architect forges a new path in glass.”
In contrast to automated search algorithms, human ingenuity is often capable of going far beyond the capabilities of existing search systems to identify new or interesting content. Certain “crowd-sourcing” techniques constitute one such set of approaches. To date, however, crowd-sourcing techniques have been limited or have been constrained to specific applications or uses.
One example of a system that attempts to enhance automated search techniques by using a crowd sourcing approach is U.S. Pat. No. 8,825,701 to Stefano Ceri, et al. (“Ceri”). Ceri teaches an interactive social networking approach to online searching, where a given search request is proposed to a crowd of cooperating online individuals. A query execution plan is also provided by Ceri's system. While following that query execution plan, each of the cooperating individuals attempts to answer the search request. When a sufficient number of answers have been collected, the answers are processed to generate an output result, which is then presented to the original requesting user.
U.S. Pat. No. 8,055,673 to Elizabeth Churchill, et al. (“Churchill”) discloses a similar approach involving a collaborative search engine. Following Churchill's methods, a first user interacts with a search engine to initiate an Internet search. The first user can then elicit the help of search friends, who receive the results of the initial Internet search and provide additional search recommendations in response. Finally, the first user can integrate the received search recommendations and modify the initial Internet search based on those recommendations.
In the field of online product sales, companies like Amazon.com, Inc. can provide product suggestions to users based on the shopping actions of other users who viewed and/or purchased similar products in the past. U.S. Pat. No. 7,113,917 to Jennifer Jacobi et al. (“Jacobi”) is an example of the Amazon technique. In Jacobi, a computer system maintains item selection histories of online shoppers. The item selection histories are collected and analyzed off-line to generate a set of data values that represent degrees to which specific items in Amazon's catalog are related to each other. The item relationship data are stored in a mapping structure that maps items to related items. Then later, while a user is shopping, the mapping structure can be used to generate personalized recommendations of related items in the Amazon catalog.
In the field of online searching, companies like Google may provide users an option to view additional documents that are similar to a given search result returned in response to a user's query. By selecting a “similar” option from a pull-down list, a user is presented with a list of documents that have a high cosine similarity to an original document. This is not a crowd-sourced technique, but it represents an additional method known in the art for suggesting new content. To calculate a cosine similarity of two documents, each term in a document is typically assigned a different dimension. A multi-dimensional vector is constructed to characterize each document, where the value of each dimension in the vector corresponds to the number of times that a given term appears in the document. The cosine similarity of the two documents is then calculated from the two vectors, where similar documents will typically have vectors that point in similar directions. Cosine similarity measures are limited, however, by the fact that they compare actual terms found in documents. That is, cosine similarity calculations do not perform a separate semantic analysis of individual terms in a document prior to comparison, nor do they reliably reflect the way humans typically think about relationships among the documents.