Due to the widespread deployment of high-speed network technology and significant advances in interactive web applications, there has been an overwhelming surge of data available on the Web over the last 10 years, including dramatic increases in end-user generated information, such as pictures, video, comments, etc, which were nearly non-existent a decade ago. To put things in context, in 2001 the entire Web contained 21 terabytes of information. Today, the daily growth in information is three times that number. The current underlying way of organizing this information is based on inferring document quality based on analyzing the in and outgoing hyperlinks, as well as using popularity based counting methods to infer user interest, such as the number of comments on an article. Organizing information based on link-structure is a slow, bias process since it depends on webmaster building those links, and for an accumulation of documents to reference a document before that document becomes relevant. This not only takes time, but those webmasters' interests, which may be highly influenced by the need to maximize the profits generated though online advertizing, are not representative of users in general who are far more populous and diverse. On the other-hand, popularity based methods give visibility to what is common and conceal valued unique expressions, resulting in the homogenization of information centered around popular appeal.
To make things more hostile for search engines based on link-based analysis, the quality of much of the most interesting content on the web, such as pictures, videos and literature, is not well characterized by considering link structure. This is because an intrinsic link-structure that inter-connects this category of items to one another is non-existent. For example, in contrast to html formatted web pages, which include mutual citations; pictures, videos and literature do not include internal links citing sources for the interesting internal elements of those items. Though there are sometimes surrounding links within or to the document that has embedded the artistic content, they are only tangentially related to the artistic content itself and hence are poor indicators as to its subjective quality and relevance.
Finally, more and more online activity is being conducted around communities, such as blogs and social networks—dynamic, heterogeneous environments where search engines are far less effective. Again, this is because the most successful algorithm (e.g., PageRank) were designed around the static link-structure of the internet, which was its dominant characteristic a decade ago, when content was centrally managed by webmasters and consumed by users. Within networks, users are the primary producers and evaluators of information, and hyperlinks are largely static and far less relevant.
The result is an information overload problem that drowns out artistic expressions, unique ideas and dynamic information, in favor for what has popular appeal and is static in nature. Hence, users are directed to information that has low personal relevance, making it costly to find information with the unique subjective qualities that they value most. Since the most valuable information also tends to be unique in nature, a significant amount of valuable information is lost since it is concealed from the greater part of our society.
Many technical approaches have been developed to address the problem of personalizing information. For example, search engines employ techniques to heuristically inflate the weights of certain static links to documents based on indicators gathered by observing personal behavior. For example, documents that a user has previously shown an interest in, booked marked, etc, are given a higher weight. Though this heuristic approach improves personal relevancy, it does not address the underlying problem that artistic content and unique ideas are poorly organized by these engines.
Online retailers, such as Amazon and NetFlix, have developed product recommendation systems, that recommend products to customers based on correlations with other customers that have purchased similar products. However, these systems tend to employ algorithms and heuristic techniques that are based on pair-wise matching. For example, “persons who bought item A also bought item B”. Since these pair-wise techniques do not model correlations and interactions within a network of interconnected individuals and items, they can not be suitably adapted to environments like the World Wide Web. This is because the Web is a massive, highly dynamic database containing a complex network of inter-related humans and information.
Another technique used to provide users with more personal, unique and artistic information has been to form online communities. These allow individuals within these self-contained groups to share information which is much more personal and unique in nature. These include social networks, blogs, listservs, media sharing sites, etc. The general problem with these techniques is that there is poor technological assistance in determining what information, out of all the information circulated within these groups, is more relevant. Thus, as the groups become larger, each specific item of information that is distributed to the group members becomes less relevant to each specific person in the group, resulting in a similar information overload/concealment problems. Another limitation of this approach is that the information generated within these groups, does not efficiently propagate out of these groups. For example, in many cases an individual must manually forward information to another group. Thus, much of the information is again lost to the greater part of society. In the case of popular blogs, search engines provide assistance in directing individuals outside of a blog's community to the blog's content. However, the documents within small to medium sized blogs generally do not rank high enough to surface within general purpose search engines, such as Google search.
Finally, numerous heuristic techniques exist that use user behavior, such as the number of comments, to influence the ranking of documents. In general, these techniques employ popularity based algorithms. That is, if two documents are considered otherwise equally relevant, then the document with the largest number of comments ranks higher. Though variations to this basic approach exist, none of these prior-art techniques creates a unified hyperspace with an internal structure that captures the relative relationship between users and information. That is, within the hyperspace of the present invention, the relevance of any specific referral, such as a favorite, is relative to the human viewing the referral, and is also relative to the document being referred. That is, if the observer does not share the same values as the referrer, then their reference (favorite) has low relevance. Further, if the referrer is not knowledgeable of the subject matter that they are referring, then their reference also has low relevance. Since this invention is unique in its ability to objectively capture these relative relationships between humans and information, the performance of applications (e.g., search, social networks, news headline services, etc) that incorporate these principles can be greatly improved.