The present invention relates to the management of personal health information or data on individuals. The invention in particular relates to the assembly and use of such data in a longitudinal database in manner that maintains individual privacy.
Electronic databases of patient health records are useful for both commercial and non-commercial purposes. Longitudinal (life time) patient record databases are used, for example, in epidemiological or other population-based research studies for analysis of time-trends, causality, or incidence of health events in a population. The patient records assembled in a longitudinal database are likely to be collected from a multiple number of sources and in a variety of formats. An obvious source of patient health records is the modern health insurance industry, which relies extensively on electronically communicated patient transaction records for administering insurance payments to medical service providers. The medical service providers (e.g., pharmacies, hospitals or clinics) or their agents (e.g., data clearing houses, processors or vendors) supply individually identified patient transaction records to the insurance industry for compensation. The patient transaction records, in addition to personal information data fields or attributes, may contain other information concerning, for example, diagnosis, prescriptions, treatment or outcome. Such information acquired from multiple sources can be valuable for longitudinal studies. However, to preserve individual privacy, it is important that the patient records integrated to a longitudinal database facility are “anonymized” or “de-identified”.
A data supplier or source can remove or encrypt personal information data fields or attributes (e.g., name, social security number, home address, zip code, etc.) in a patient transaction record before transmission to preserve patient privacy. The encryption or standardization of certain personal information data fields to preserve patient privacy is now mandated by statute and government regulation. Concern for the civil rights of individuals has led to government regulation of the collection and use of personal health data for electronic transactions. For example, regulations issued under the Health Insurance Portability and Accountability Act of 1996 (HIPAA), involve elaborate rules to safeguard the security and confidentiality of personal health information. The HIPAA regulations cover entities such as health plans, health care clearinghouses, and those health care providers who conduct certain financial and administrative transactions (e.g., enrollment, billing and eligibility verification) electronically. (See e.g., http://www.hhs.gov/ocr/hipaa). Commonly invented and co-assigned U.S. patent application Ser. No. 10/892,021 filed Jul. 15, 2004, which is hereby incorporated by reference in its entirety herein, describes systems and methods of collecting and using personal health information in standardized format to comply with government mandated HIPAA regulations or other sets of privacy rules.
For further minimization of the risk of breach of patient privacy, it may be desirable to strip or remove all patient identification information from patient records that are used to construct a longitudinal database. However, stripping data records of patient identification information to completely “anonymize” them can be incompatible with the construction of the longitudinal database in which the stored data records or fields are necessarily updated patient by patient.
Commonly owned U.S. patent application Ser. No. 11/122,589 filed May 5, 2005, incorporated by reference herein, discloses a solution which allows patient data records acquired from multiple sources to be integrated each individual patient by patient into a longitudinal database without creating any risk of breaching of patient privacy.
The solution disclosed in the referenced patent application uses a two-step encryption procedure using multiple encryption keys. The procedure involves the data sources or suppliers (“DS”), the longitudinal database facility (“LDF”), and a third party implementation partner (“IP”). The IP, who may also serve as an encryption key administrator, can operate at a facility component site. At the first step, each DS encrypts selected data fields (e.g., patient-identifying attributes and/or other standard attribute data fields) in the patient records to convert the patient records into a first “anonymized” format. Each DS uses two keys (i.e., a vendor-specific key and a common longitudinal key associated with a specific LDF) to doubly encrypt the selected data fields. The doubly encrypted data records are transmitted to a facility component site, where the IP can process them further. The data records are processed into a second anonymized format, which is designed to allow the data records to be linked individual patient by patient without recovering the original unencrypted patient identification information. For this purpose, the doubly encrypted data fields in the patient records received from a DS are partially de-crypted using the specific vendor key (such that the doubly encrypted data fields still retain the common longitudinal key encryption). A third key (e.g., a token based key) may be used to further prepare the now-singly (common longitudinal key) encrypted data fields or attributes for use in a longitudinal database. Longitudinal identifiers (IDs) or dummy labels that are internal to the longitudinal database facility may be used to tag the data records so that they can be matched and linked individual ID by ID in the longitudinal database without knowledge of original unencrypted patient identification information. (See FIG. 1)
Commonly owned U.S. patent application Ser. No. 11/122,564 filed May 5, 2005, incorporated by reference herein, discloses matching algorithms, which may be used to determine if a new or previously defined ID should be assigned to a set of encrypted data attributes. Once a new or previously defined ID has been assigned, the ID may then be used to link back to tag full data records, which include detailed transaction information. The matching algorithms assign longitudinal linking tags to de-identified patient data records by matching the patient data records with reference data records. The de-identified patient data records may include both encrypted and non-encrypted data attributes. Different possible subsets of the data attributes are categorized in a hierarchy of levels. Subsets of data field values are compared with the reference data records one level at a time. Upon successful comparison or matching of a subset of data field values, a longitudinal linking tag associated with a matched reference data record is assigned to de-identified data record is assigned. When a match is not found, a new longitudinal linking tag is created and assigned to the de-identified data record. The new tag and corresponding data record attributes are then added to the reference data for future matching operations.
Further consideration is now being given to the processes for matching patient data generated or received from diverse data sources. In particular, attention is paid to matching processes and logic that, for example, allow linking of a patient's data records across different pharmacies serving the same patient.