1. Field of the Invention
The present embodiments relate generally to recommendation systems and more particularly to recommendation systems identifying domain-relevant data or domain-relevant experts.
2. Description of the Prior Art
Current recommendation systems or collaborative filtering systems exist to assist users with their media experience and choices by recommending media that may be of interest to the user. Many of these systems use metrics such as author, album, genre, general ratings by others, personal user ratings, ratings based on others who like some of the same songs as the user and so forth. Some systems have even created a database that breaks down music into components and classifies each, such as the ‘Music Genome Project’ discussed in U.S. Pat. No. 7,003,515. The general concept being that the ‘taste’ or ‘flavor’ of these media can be quantified in a manner to help the user identify additional media they might enjoy or have interest in hearing, seeing or reading.
These previous systems can be useful for media recommendations; however, these kinds of systems fall short in their capability when it comes to recommending or identifying relevant data in specialized domains such as scientific research. One of the challenges in these domains is that the data is often more than a single type (unlike music), thereby making it difficult to create a finite number of classifications. Furthermore, unlike previous systems where the correct classification was given by the data producers or common understanding (e.g. the genre of a movie), data-driven discovery projects may not know a priori how to correctly classify data elements. In fact, discovering the correct classification may be the explicit purpose of the project.
The need for recommender systems in discovery projects is highlighted by the challenges of large data sets. In specialized domains, the volume of data may be too much for a single analyst or decision maker to accommodate when performing their job functions. In order to manage voluminous data, it gets distributed among multiple scientists, doctors, intelligence analysts and so forth. This distribution in turn creates a new challenge, as it tends to create silos of data that lead to inefficient or incomplete analysis.
By way of example, some of these databases may include: thousands of mouse experiments containing physiological, genealogical, and genetic data for each mouse; tens of thousands of patients having similar initial findings; or millions of satellite images that share similar pixel statistics. Experts and analysts in each of these areas cannot be familiar with all of the data or, in some instances, even know of other experts/analysts in the same field.
3. Object of the Invention
There is a need to discover experts, analysts and relevant data as it pertains to a particular project such as scientific analysis, patient diagnosis, and other areas where large data sets exist.
The present embodiments described herein seek to address these needs.