Internet search engines have become fundamental tools for nearly all users seeking information and sites on the World Wide Web (WWW). Users can find vast amounts of data and select the data that appears to best match specific search criteria. Free-text searches are generally performed by providing a search phrase including one or more keywords, and optionally Boolean operators. The most widely used free-text search engines currently are provided by Google, Inc. and Yahoo, Inc. Most large websites offer site-specific search tools for finding content on the webpages of the website.
Based on the search phrase provided by a user, a search engine generally returns a list of documents from which the user selects those that appear most relevant. The list typically includes a snippet from each of documents that includes one or more of the keywords, and the URL of the document. Typically, the search engine presents the list of documents in descending order according to general, static criteria established by the search engine provider. Numerous techniques have been developed for ranking the list in order to provide the results most likely to be relevant to a typical user. Some of these techniques take into account the order of the keywords provided by the user.
Such static ranking systems often present high-ranking results that do not match the interests or skills of the searcher, or that do not provide results that correctly reflect the intended meaning of keywords having more than one meaning. For example, a software engineer looking for Java (i.e., software) and a traveler looking for Java (i.e., the island) receive the same results for a query that includes the same keywords, even though their searches had different intended meanings.
Some search engines, such as the one provided by AOL, Inc., attempt to overcome this drawback by using user profiles that specify certain static characteristics of each user. Such characteristics may include information such as the searcher's age, location, job, and education. Each user must provide this information and keep it updated as the user's interests change over time. Such information often does not accurately reflect the user's skill levels in various interest areas. Such profiles also generally fail to adequately reflect the full diversity of the user's interests.
Some search engines are configured to rank results of multi-keyword searches using merge algorithms. For example, the search engine may use criteria to separately rank the results for each of the keywords searched separately, and merge the separate rankings to produce a list of search results containing all of the keywords. Some search engines use collaborative filtering based on social networks, forums, communities, or other types of groups, in an attempt to supply more relevant search results.
Internet advertisements are often targeted to website visitors. Some search engines use search queries to target advertisements to search engine users on the search results pages. For example, Google's AdWord program performs such targeting.
Internet advertisements are often presented on a webpage in the form of banner ads that comprise rectangular boxes including graphical components. When a visitor to a website selects one of these banner ads by clicking on it, embedded hypertext links typically direct the viewer to the advertiser's website. This selection process is referred to as “click-through.” The “click-through rate” of an ad is the ratio of the number of click-throughs to the number of impressions of the ad, i.e., the number of times an ad is viewed.
Internet advertisements are increasingly presented on or via a webpage in the form of widgets, which comprise portable pieces of code that can be installed and executed within a webpage or otherwise on a user's personal computer. Widget ads are often interactive, and present dynamic content provided by the advertiser. For example, Google's AdSense uses widgets (referred to as “gadgets”) as one vehicle for distributing advertisements.
International Publication WO 07/124,430 to Ismalon, which is assigned to the assignee of the present application and is incorporated herein by reference, describes a method including presenting to a user a range of levels of personalization of search results, including a personalized level, a global level that is not personalized, and a community level between the personalized level and the global level. An indication of a desired one of the levels, and a search query consisting of one or more query terms, are received from the user. Responsively to the search query, a search result listing is generated. At least a portion of the search result listing is ranked at least in part responsively to the indication, and at least a portion of the ranked search result listing is presented to the user.
U.S. Pat. No. 4,839,853 to Deerwester et al., which is incorporated herein by reference, describes a methodology for retrieving textual data objects. The information is treated in the statistical domain by presuming that there is an underlying, latent semantic structure in the usage of words in the data objects. Estimates to this latent structure are utilized to represent and retrieve objects. A user query is recouched in the new statistical domain and then processed in the computer system to extract the underlying meaning to respond to the query.
U.S. Pat. No. 5,754,938 to Herz et al., which is incorporated herein by reference, describes the customized electronic identification of desirable objects, such as news articles, in an electronic media environment. A system automatically constructs both a “target profile” for each target object in the electronic media based, for example, on the frequency with which each word appears in an article relative to its overall frequency of use in all articles, as well as a “target profile interest summary” for each user, which target profile interest summary describes the user's interest level in various types of target objects. The system then evaluates the target profiles against the users' target profile interest summaries to generate a user-customized rank ordered listing of target objects most likely to be of interest to each user so that the user can select from among these potentially relevant target objects, which were automatically selected by this system from the plethora of target objects that are profiled on the electronic media. Users' target profile interest summaries are used to efficiently organize the distribution of information in a large scale system consisting of many users interconnected by means of a communication network. Additionally, a cryptographically-based pseudonym proxy server is provided to ensure the privacy of a user's target profile interest summary, by giving the user control over the ability of third parties to access this summary and to identify or contact the user.
U.S. Pat. No. 7,313,556 to Gallivan et al., which is incorporated herein by reference, describes techniques for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.
U.S. Pat. No. 7,152,065 to Behrens et al., which is incorporated herein by reference, describes adapting latent semantic indexing (LSI) for information retrieval and text mining operations to work on large heterogeneous data sets by first partitioning the data set into a number of smaller partitions having similar concept domains. A similarity graph network is generated in order to expose links between concept domains which are then exploited in determining which domains to query as well as in expanding the query vector. LSI is performed on those partitioned data sets most likely to contain information related to the user query or text mining operation. In this manner LSI can be applied to datasets that heretofore presented scalability problems. Additionally, the computation of the singular value decomposition of the term-by-document matrix can be accomplished at various distributed computers increasing the robustness of the retrieval and text mining system while decreasing search times.
U.S. Pat. No. 6,137,911 to Zhilyaev, which is incorporated herein by reference, describes a method for classifying documents into one or more clusters corresponding to predefined classification categories by building a knowledge base comprising matrices of vectors which indicate the significance of terms within a corpus of text formed by the documents and classified in the knowledge base to each cluster. The significance of terms is determined assuming a standard normal probability distribution, and terms are determined to be significant to a cluster if their probability of occurrence being due to chance is low. For each cluster, statistical signatures comprising sums of weighted products and intersections of cluster terms to corpus terms are generated and used as discriminators for classifying documents. The knowledge base is built using prefix and suffix lexical rules which are context-sensitive and applied selectively to improve the accuracy and precision of classification.
US Patent Application Publication 20040220850 to Ferrer et al., which is incorporated herein by reference, describes a method for facilitating viral marketing, in which a plurality of computer users communicate via a set of interconnected terminals and share online experiences under the direction of a single user. A plurality of terminals, each having a user interface, connect to a server or network through a portal rather than directly to the server. One terminal serves as a leader terminal. Each terminal connects to the portal so that it can send and receive data and commands between each of the plurality of terminals and the portal through the portal. The portal then connects to a server using a telecommunications connection. The server has the informational content resident thereon desired by the user and presents marketing messages (or other messages) to each of the users. The reactions of the users are recorded, with the recorded information used to identify when a leader of a group is also leading purchasing behavior for the group.
US Patent Application Publication 2004/0059708 to Dean et al., which is incorporated herein by reference, describes techniques for improving the relevance of advertisements to a user's interests. In one implementation, the content of a web page is analyzed to determine a list of one or more topics associated with that web page. An advertisement is considered to be relevant to that web page if it is associated with keywords belonging to the list of one or more topics. One or more of these relevant advertisements may be provided for rendering in conjunction with the web page or related web pages.
US Patent Application Publication 2005/0091111 to Green et al., which is incorporated herein by reference, describes a method of interactive advertising for the Internet, in which a commercial link for an ad space is embedded in the text of a Web page. The contextual targeting for the page is determined by analyzing the overall content of the page or determining the presence of individual keywords within the text content of the page. A keyword that is visually distinguished from the surrounding page content triggers an ad space to display a message, which may include a product related to the text. In some embodiments, the ad space allows a user to make a purchase transaction or view an inventory of goods and services, with descriptions, all without leaving the Web page. The web page is analyzed to determine appropriate keywords within the text to associate with the ad space. Upon user interaction with the keyword, an ad space according to the invention provides a customized message that is contextually targeted to the user.
Gawronski P et al., in “The Heider balance and social distance,” Acta Physica Polonica B 36(8):2549-2558 (2005), which is incorporated herein by reference, explore the Heider balance, which is a state of a group of people with established mutual relations between them. These relations, friendly or hostile, can be measured in the Bogardus scale of social distance. The authors examine the influence of allowed ranges for these relations on system dynamics.
Axelrod R, in “The dissemination of culture: a model with local convergence and global polarization,” J Conflict Res 41(2):203-226 (1997), which is incorporated herein by reference, describes an agent-based adaptive model of social influence that reveals the effects of a mechanism of convergent social influence. The model is described as illustrating how local convergence can generate global polarization.
The following references, all of which are incorporated herein by reference, may be of interest:
US Patent Application Publication 2005/0033641 to Jha et al.
PCT Publication WO 06/103616 to Pitchers
U.S. Pat. No. 5,987,457 to Ballard
US Patent Application Publication 2005/0076003 to DuBose et al.
U.S. Pat. No. 6,732,088 to Glance
U.S. Pat. No. 6,772,150 to Whitman et al.
US Patent Application Publication 2003/0123443 to Anwar
U.S. Pat. No. 6,636,848 to Aridor et al.
U.S. Pat. No. 4,823,306 to Barbic et al.
U.S. Pat. No. 6,513,036 to Fruensgaard et al.
US Patent Application Publication 2002/0133483 to Klenk et al.
U.S. Pat. No. 5,926,812 to Hilsenrath et al.
U.S. Pat. No. 6,289,353 to Hazlehurst et al.
US Patent Application Publication 2005/0055341 to Haahr et al.
U.S. Pat. No. 6,363,379 to Jacobson et al.
U.S. Pat. No. 6,347,313 to Ma et al.
U.S. Pat. No. 6,321,226 to Garber et al.
U.S. Pat. No. 6,189,002 to Roitblat
U.S. Pat. No. 6,167,397 to Jacobson et al.
U.S. Pat. No. 5,864,845 to Voorhees et al.
U.S. Pat. No. 5,825,943 to DeVito et al.
US Patent Application Publication 2005/0144158 to Capper et al.
US Patent Application Publication 2005/0114324 to Mayer
US Patent Application Publication 2005/0055341 to Haahr et al.
U.S. Pat. No. 5,857,179 to Vaithyanathan et al.
U.S. Pat. No. 7,139,755 to Hammond
U.S. Pat. No. 7,152,061 to Curtis et al.
U.S. Pat. No. 6,904,588 to Reddy et al.
U.S. Pat. No. 6,842,906 to Bowman-Amuha
U.S. Pat. No. 6,539,396 to Bowman-Amuha
US Patent Application Publication 2004/0249809 to Ramani et al.
US Patent Application Publication 2003/0058277 to Bowman-Amuha
U.S. Pat. No. 6,925,460 to Kummamuru et al.
U.S. Pat. No. 6,920,448 to Kincaid et al.
US Patent Application Publication 2006/0074883 to Teevan et al.
US Patent Application Publication 2006/0059134 to Palmon et al.
US Patent Application Publication 2006/0047643 to Chaman
US Patent Application Publication 2005/0216434 to Haveliwala et al.
US Patent Application Publication 2003/0061206 to Qian
US Patent Application Publication 2002/0073088 to Beckmann et al.
US Patent Application Publication 2005/0086283 to Marshall
U.S. Pat. No. 7,249,053 to Wohlers et al.
US Patent Application Publication 2007/0265922 to Dumond et al.
International Application WO 00/62171 to Glazer
International Application WO 01/29727 to Green et al.
U.S. Pat. Nos. 6,615,238, 6,917,961, and 7,233,973 to Melet et al.
US Patent Application Publication 2007/0226082 to Leal
US Patent Application Publication 2006/0218036 to King et al.
A whitepaper entitled, “Searchable Banners: The Next Wave for Online Databases” (Borrell Associates Inc., November 2005)
Berkowitz, David, “Banner Ads: The New Search Engine,” SearchINSIDER (Dec. 6, 2005)
Hofmann T, “Probabilistic latent semantic indexing,” Proceedings of the Twenty-Second Annual International SIGIR Conference (1999)
Blei D et al., “Latent Dirichlet allocation,” Journal of Machine Learning Research 3 (2003)
Griffiths T et al., “Finding Scientific Topics,” Proceedings of the National Academy of Sciences 101 (suppl. 1):5228-5235 (2004)
Steyvers M et al., “Probabilistic topic models.” In Landauer T et al. (eds), Latent Semantic Analysis: A Road to Meaning (2007)
Dhillon I et al., “A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts,” UTCS Technical Report #TR-04-25 (Feb. 18, 2005)
Grady L et al., “Isoperimetric Graph Partitioning for Data Clustering and Image Segmentation,” IEEE Transactions On Pattern Analysis And Machine Intelligence (2004)