Field of the Invention
The present invention relates generally to data de-identification and, in particular, a system and method for de-identifying data using cascading token generation.
Description of Related Art
For decades, data including personally-identifying information has been de-identified through the creation of tokens that uniquely identify an individual. This technology has been used in connection with consumer package goods data, television data, subscriber data, healthcare data, and the like.
Traditionally, methods for creating tokens for a specific record associated with an individual involved concatenating selected data elements into a string, and then encrypting that string to form a token. However, there are scenarios in which concatenated substrings will yield less than optimal results. Advances in computing power now allow for token generation to be complex, even across large volumes of data, providing for enhanced data security. Moreover, once a token is created, additional security measures are desirable to prevent reverse-engineering through statistical analysis attacks.
By law, Protected Healthcare Information (PHI) cannot be freely disseminated. However, if properly de-identified to the point where the risk is minimal that an individual could be re-identified, the PHI can be disclosed by a covered entity or an entity in legal possession of PHI.