1. Field of the Invention
Aspects of the invention generally relate to the field of information. Specifically, aspects of the present invention relate to a method and system that correlates information from multiple sources without compromising confidentiality requirements.
2. Description of Related Art
In an information age, vast amounts of data are continuously collected. Such collected data may be digital or non-digital. The data includes both public data, such as a congressional archive, and private data, such as patients' medical records in a hospital. Different organizations, public or private, record data to serve their different needs. For example, a common practice in an e-commerce environment is to build individualized customer profiles based on on-line collected customer information. Different businesses such as banking, investment, retail, travel, entertainment, real estate, and dating services usually design specific customer profiles to fit their business needs. For instance, for a particular customer, a travel agency may wish to gather information about the customer's hobbies, places traveled in the last 3 years, or a preference between a beach and a ski resort, while an on-line music store may wish to know the customer's preferred music categories, such as Jazz or classical music to effectively target advertising to the customer on the web. Similarly, for a particular patient, the patient's medical records at a dermatologist's clinic and the patient's records at an allergist's clinic will contain substantially different content. In both cases, the only common content between the two data collections for the same person may be some unique personal identification information.
Large amounts of data collected have led to an inventive usage of data that further creates new information. When data from different collections are jointly examined, new useful information may be extracted. In the above examples, various customer profiles collected by different business practices for a particular customer can be collectively examined so that a new profile about a person's overall spending pattern may be extracted. Another example is medical records. If patients' treatment records can be examined together with the lab test records (these two types of records are often stored in different collections), it is possible to generate information about the kind of drugs that are effective for particular types of patients.
Such combined usage of data requires that different data sets be properly correlated. In the above example, assume that the lab test records are in data collection A and the treatment records (e.g., what drug is used with what dose at what time interval) are in data collection B. To jointly use both types of data to analyze the effectiveness of a particular drug, the lab test results for individual patients (from A) have to be correlated with the corresponding treatment records (from B) before the analysis can be performed.
The information, based on which the correlation between different data sets can be made, may be some unique IDs such as patients' social security numbers. Such information is identifying because it reveals identifying information. The use of such identifying information may pose a serious confidentiality or privacy issue. For example, even though the lab results and treatment records described above may be initially stored individually and privately, the combined information derived by correlating different data sets using identifying information exposes certain private information such as, for example, the diagnosis, to a scope that is not originally intended. Therefore, while inventive usage of data can generate other useful information, the possible side effect related to confidentiality or privacy issues has to be eliminated. New methods are needed to enable effective and inventive usage of data without violating confidentiality requirements.