The Health Insurance Portability and Accountability Act (HIPAA), enacted in 1996 includes provisions intended to help people maintain privacy with regard to their health information. Title II of HIPAA, known as the “Administrative Simplification” (AS) provisions, includes a privacy rule that addresses the security and privacy of health data. The Privacy Rule, which took full effect in 2004, regulates the use and disclosure of Protected Health Information (PHI) held by covered entities such as health insurers and medical service providers. PHI is defined as any information held by a covered entity which concerns health status, provision of health care or payment for health care that can be linked to an individual person.
Classic identifiers include identifying information such as name, patient number, and Social Security number. In general, a classic identifier is any information that is meant to identify the person as an individual. One of the conventional methods used to protect a patient's privacy includes hiding classic identifiers such as name, Social Security number (SSN), medical record number (MRN), and patient identification number. It is possible, however, to identify a person based on a non-classic identifier or by using a combination of non-classic identifiers. That is, in some cases, a field or a combination of fields may be identifying. This may be a problem in those systems that combine clinical, geographic and demographic data, e.g. cancer site, county and race may be identifying, particularly in regions of low population density.
A conventional method for assessing the risk of potential breach of patient confidentiality via non-classic identifiers in a set of data records is the “Record Uniqueness” method. The Record Uniqueness method is disclosed in “Method to Assess Identifiability in Electronic Data Files” by Holly L. Howe, et al., American Journal of Epidemiology, December 2006. The Record Uniqueness method generates frequencies for every variable and combination of variables in a data set. For each frequency distribution, the Record Uniqueness method counts the number of records with a frequency of one, which is defined as a unique record, and the number of records with a frequency of five or less, which is defined as a unique record set.
The Record Uniqueness method, however, does not take classic identifiers into account. Thus, a non-classic identifier may be overlooked as a result of data redundancy, when in fact, the redundancy is an attribute of the data record and not the non-classic identifier. Accordingly, assessing the risk of breach of confidentiality without classic identifiers can compromise patient confidentiality and can also compromise the integrity of the dataset.
For the foregoing reasons, there is a need for a more trustworthy method to analyze data records for personal identifiability.