Typically, patients can receive care from multiple healthcare providers geographically dispersed at multiple sites. At each site, the patient is usually given a different patient identifier. This patient identifier may be used locally at the healthcare provider. Moreover, the patient data of a single patient, such as medical images and other relevant medical information, is spread across multiple sites and labeled with different local patient identifiers. In order to be able to retrieve the patient data stored elsewhere, the patient identifiers are reconciled and the respective patient records are linked together.
Over time, the average volume of data collected for a patient in the context of a complex disease such as cancer has increased tremendously. For example, for recurring cancer patients, a large part of the medical history can be relevant to a clinician. In the case of recurring cancer patients, the relevant cancer-related health episodes can go back many years. Co-morbidities are often relevant as well, as they are a very constraining factor in choosing a therapy. For example, many chemotherapy agents are cardio-toxic, and in order to choose the right therapy, prior information concerning cardiac disease may be important. It is highly unlikely that the information about all these health-related episodes resides in the system of a single institution. However, the treating clinician seeing a patient should be able to extract all the relevant prior health episodes from the patient record, both cancer- and non-cancer-related, which may include many episodes and span decades.
The flow of information into and out of the patient record is typically channeled through a Master Patient Index (MPI) that assigns a unique medical record number (MRN) to each patient of an entity when a unit record exists. Herein, the unit record comprises the actual patient data maintained by the entity. It can be the electronic health record of the patient, the patient record in the radiology information system (RIS) and all the other patient data (such as studies, images, lab data) maintained by that entity. All the data items of a unit record may be linked by the locally assigned patient identifier (e.g., MRN). To obtain a view of patients across distributed data sources, the local identifiers in the individual institutions are reconciled. This is currently done by building an Enterprise-wide Master Patient Index (EMPI) that interrelates all the identifiers in the hospitals that are part of the enterprise. The EMPI is developed through integration of the individual MPIs of the sources. Generally, the integration is achieved by comparing demographic attributes such as first/last name, gender, date of birth, address etc., to create an enterprise-level identifier. The integration is rarely based on a single identifier shared across the different organizations in the enterprise. Most of the existing systems deploy probabilistic algorithms which typically compare a fixed record with a number of candidates for a match, computing for each candidate a likelihood ratio (weighted score) that is compared to chosen accept and reject thresholds. The result is used to decide whether to link the records or not. When the decision cannot be taken automatically (the computed likelihood falls in between the two thresholds), qualified personnel need to review or flag the potential (mis)matches before they are accepted (or rejected). The manual review of uncertain matches helps to minimize linkage errors as they can have far-reaching consequences, ultimately endangering a patient's health. However, submitting a large amount of records for manual review is very costly and may make the entire solution impractical.
The paper “Efficient Private Record Linkage” by Mohamed Yakout et al., IEEE International Conference on Data Engineering, 2009, pp. 1283-1286 discloses a protocol for private record linkage that makes no use of a third party. The protocol consists of two phases. In phase 1, candidate pairs of records for matching are produced. In phase 2, the task of computing a Euclidean distance between each candidate pair is completed. Both parties participate in the Euclidean distance computations without revealing the original representations of their respective records.