1. Technical Field
The present invention relates to a method and system for selectively de-identifying or masking data and, more particularly, to a technique for dynamically de-identifying or masking data upon discovery while preserving data usability across software applications.
2. Discussion of the Related Art
Across various industries, data (e.g., data related to customers, patients, or suppliers) is shared outside secure corporate boundaries. Various initiatives (e.g., outsourcing tasks, performing tasks off-shore, etc.) have created opportunities for this data to become exposed to unauthorized parties, thereby placing data confidentiality and network security at risk. In many cases, these unauthorized parties do not need the true data value to conduct their job functions. Examples of data requiring de-identification include, but are not limited to, names, addresses, network identifiers, social security numbers and financial data.
Conventional data de-identification or masking techniques are developed manually and implemented independently in an ad hoc and subjective manner for each application. Since it is not possible to consume sensitive fields and information into batch/real time processes, these processes, such as Extract/Transform/Load (ETL), are stand-alone processes in which live data is sourced in batch or real-time. Data requiring de-identification that is located within a data source is initially discovered and profiled by a separate discovery tool. Data de-identification or masking is defined by a user after manual review of the discovery tool output, and then applied to the data. Specifically, an ETL developer manually selects and enters various field types and corresponding de-identification or masking for an ETL process to enable the process to de-identify or mask those fields. The resulting de-identified or masked data is subsequently delivered to other environments.