Patients receiving care at several healthcare organizations may receive several distinct patient identifiers, usually one at each autonomous healthcare organization that they visit. The patient data, such as electronic medical records, medical images, and other relevant medical information are spread across multiple sites. In order to be able to retrieve all relevant patient data regardless of the healthcare site where these data were captured and stored, the patient identifiers at the various sites and the respective patient records should be linked, without requiring the sites to adopt a common patient identifier.
The volume of data collected for a patient in the context of a complex disease such as cancer has increased tremendously, and a large portion of these data, describing the medical history of a patient, can be relevant for diagnosis and treatment of the patient. In the case of recurring cancer patients, the relevant cancer-related health episodes can go many years back. Comorbidities are often relevant as well, as they may be a constraining factor for choosing a therapy. For example, many chemotherapy agents are cardiotoxic, and in order to choose the right therapy, prior information concerning cardiac disease may be important. It is highly unlikely that the information about all these health-related episodes of a patient resides in the system of a single institution. However, the treating clinician seeing a patient should be able to extract all the relevant prior health episodes from the patient record, cancer-related as well as and non-cancer-related episodes. Each patient record may include many episodes and span a few decades.
The flow of information into and out of the patient record is typically channeled through a Master Patient Index (MPI) that associates a unique medical record number (MRN) with each patient entity when a unit record exists. To obtain a view on patients across distributed data sources, the local identifiers in the individual institutions need to be reconciled. This is currently done by building an MPI that interrelates all the identifiers in hospitals that are part of a collaborating group, or enterprise. An Enterprise Master Patient Index (EMPI) is developed through integration of the individual MPIs of the sources. Generally, the integration is achieved by comparing demographic attributes such as first/last name, gender, date of birth, address etc., to create an enterprise-level identifier, and is rarely based on a single identifier shared across the different organizations in the enterprise. Most of the existing systems deploy probabilistic algorithms which typically compare a fixed record with a number of candidates for a match, computing for each candidate a likelihood ratio (weighted score) that is compared to chosen accept and reject thresholds. This is used to decide whether to link the records or not. When the decision cannot be taken automatically (the computed likelihood falls between the accept and reject thresholds), qualified personnel reviews or flags the potential (mis)matches before they are accepted (or rejected). However, submitting a large amount of records for manual review is very costly and may make the solution impractical.
The number of matches automatically rejected, accepted, or submitted for manual review depends both on the weights associated with the different attributes during comparison, on the basis of which the likelihood ratio is computed, and on the chosen reject and accept thresholds. The matching process is tuned with these thresholds by trading off data consistency versus completeness. The pairs of records having scores between the reject and accept threshold are submitted for manual review, which means that a clinical expert or dedicated personnel needs to manually review those records and decide on whether to match them, or even ask the patient whether the two records belong to him or her. With a conservative approach, when both false positives and false negatives need to be avoided, this may constitute a high cost for the healthcare organization. Also, when matching across large medical record systems and other data sources such as PACS and lab systems, and when large amounts of data need to be manually reviewed and matched for each pair of potentially matching records, the number of erroneous matches may go up.
S. J. Grannis et al., “Analysis of a probabilistic Record Linkage Technique without Human Review”, AMIA 2003 Symposium Proceedings, pages 259-263, discloses record linkage using probabilistic linkage techniques. Grannis et al. further discloses avoiding human review in such methods by means of an estimator function using an expectation maximization algorithm to establish a single true-link threshold.