Personal information is being continuously captured in a multitude of electronic databases. Details about health, financial status and buying habits are stored in databases managed by public and private sector organizations. These databases contain information about millions of people, which can provide valuable research, epidemiologic and business insight. For example, examining a drugstore chain's prescriptions or over the counter drug sales can indicate where a flu outbreak is occurring. To extract or maximize the value contained in these databases, data custodians must often provide outside organizations access to their data. In order to protect the privacy of the people whose data is being analyzed, a data custodian will “de-identify” information before releasing it to a third-party. De-identification ensures that data cannot be traced to the person about whom it pertains. In addition, there have been strong concerns about the negative impact of explicit consent requirements in privacy legislation on the ability to conduct health research. Such concerns are reinforced by the compelling evidence that requiring opt-in for participation in different forms of health research can negatively impact the process and outcomes of the research itself.
When de-identifying records, many people assume that removing names and addresses (direct identifiers) is sufficient to protect the privacy of the persons whose data is being released. The problem of de-identification involves those personal details that are not obviously identifying. These personal details, known as quasi-identifiers, include the person's age, sex, postal code, profession, dates of events such as for example date of birth, medical procedures or visits, ethnic origin and income (to name a few).
Data de-identification is currently a manual process. Heuristics are used to make a best guess how to remove identifying information prior to releasing data. Manual data de-identification has resulted in several cases where individuals have been re-identified in supposedly anonymous datasets. Accordingly, systems and methods that enable shifting dates in the de-identification of datasets remain highly desirable.
It will be noted that throughout the appended drawings, like features are identified by like reference numerals.