The rapid growth of digital data storage and the overwhelming supply of on-line information provided by today's communication networks creates a risk of constant information overload. One of the key problems of modern information society is the increasing difficulty in accessing relevant information while suppressing the overwhelming mass of irrelevant data. Most importantly, the notion of relevance is highly individual and thus difficult to formalize in general terms. Information Filtering refers to the general problem of separating useful and important information from nuisance data. Individual users have different preferences, opinions, judgments, tastes, and cultural backgrounds. In order to support different individuals in their quest for information, an automated filtering system has to take into account the diversity of preferences and the inherent relativity of information value.
One commonly distinguishes between (at least) two major approaches to information filtering. The first approach is content-based filtering in which information organization is based on properties of the object or the carrier of information. The second approach is collaborative filtering (or social filtering), in which the preference-behavior and qualities of other persons are exploited in speculating about the preferences of a particular individual. Information Filtering technology had a huge impact on the development of the Internet and the e-commerce boom.
Search engines are classical content-based systems based on pattern matching technology to quickly retrieve information based on query keywords. Search engines are fast and have proven to be scalable to data sets of the size of the Internet, but they have inherent limitations. Due to the simple nature of direct pattern matching they provide no understanding of the sense of a word and the context of its use, often resulting in an unexpected variety of search results. These results are typically explained by word matching, but are meaningless on the intentional level. Search engines typically are unable to effectively process over-specific queries where keywords are closely related to actual document content, yet do not appear in the actual text. Search engines are also frequently non-personalized, i.e., they provide the same results, independent of the user history, and they have no effective mechanism to learn from user satisfaction.
At the other end of the spectrum, E-commerce sites use recommender systems to suggest products to their customers. The products can be recommended based on the top overall sellers on a site, or on the demographics of the customer, or an analysis of the past buying behavior of the customer as a prediction for future buying behavior. Recommender systems aim at personalization on the Web, enabling individual treatment of each customer. However, recommender systems and their underlying collaborative filtering techniques have several shortcomings. Poor predictions at best can be made for new objects and new users (“Early rater problem”). In many domains, only a small percentage of possible items receive ratings for a given user (“Scarcity problem”). There are often users with different interests and tastes than any other users in the database (“Grey sheep problem”). More fundamentally, while many users may share some common interest, it may be extremely difficult to find a sufficient number of users that share all interests.
Therefore, in light of the foregoing deficiencies in the prior art, the applicant's invention is herein presented.