1. Field of the Invention
The present invention is directed to a system and process for analyzing records to determine duplicates.
2. Description of the Related Art
The concept of a master patient index (MPI) is well known in the healthcare industry. As healthcare computer systems become increasingly complex and distributed over wide areas, it is important to be able to uniquely and correctly identify individual patients over a wide array of disjoint or unconnected systems. An MPI system seeks to uniquely identify an individual based on information provided.
At its core, an MPI system stores information on a patient over time. As information changes for the patient, updates are made to the MPI system in order to identify the patient with greater granularity and accuracy. When a query is made, even with outdated information, an effective MPI system is able to return potential matches for that query with a high rate of accuracy. Even in cases where no reasonable match is available, an MPI system may return that result to the inquirer with the option of adding it to its database as a newly identified patient. In cases where possible matches exist, but with less than stellar probabilities of a match, MPI systems can interface with human or other artificially intelligent systems to make final decisions.
An MPI, therefore, may be applicable to a variety of systems. Consider cases where insurance companies or health care providers share information with other companies or providers on a regular, ongoing basis. Over time, patient information may evolve, as addresses change, erroneous data is corrected, or missing data is obtained. Where unique patient identifiers are not available due to organizational, privacy, or legal reasons, an MPI can provide a valuable link in cleaning or otherwise aggregating knowledge of the data.
Often, the information processed by an MPI centers on demographic data. This data may be sparse or outdated, which can lead to the discontinuity or loss of important patient data. To minimize this, an MPI system may take the provided information and perform comparisons with persons already known to the system. The processes employed may make probabilistic determinations based on the relevance of certain data points, or attributes. However, these probabilistic determinations often are based primarily on subjective assessments of the data. As such, previous MPIs may have required a user to have substantial knowledge of the data and, even then, may have required the user make several guesses in assessing the relative value of that information.
In addition, data sets on which an MPI operates often are exceedingly large, e.g., with millions, if not tens of millions, of records, each having dozens of different fields. Analysis of these data sets may be very time intensive and further may inhibit “real time” evaluations. In addition, the scale and scope of these data sets, and the need to compare each record with each other record and, therefore, fields within each of those records, means this task is beyond the scope of human calculation and analysis. At the same time, however, it may be desirable for the MPI to perform several tasks for which human reasoning and analysis may be beneficial. Context may be significant in determining whether a match exists or not. For example, two records may have different field values for the first name and matching values for other fields. In one instance, these two records may represent the same individual, where one of the first name field values is a nickname. In another instance, these two records may represent twins, who share the same last name, address and date of birth.
What is needed is a system or process that overcomes the drawbacks described above.