Information is the oxygen of enterprises. Enterprises need to store and process sensitive information in their production environment databases. This information often consists of employee, customer, partner and vendor records containing sensitive details like names of individuals, addresses, telephone numbers, emails, social security numbers, credit card information, health insurance details, health records, and financial records to name a few. Enterprises take steps to keep such sensitive data private both to protect their own interests and the interests of their clients, partners, and customers. Indeed, much of this data is required by law to be kept private. For example, the Payment Card Industry Data Security Standard (“PCI DSS”) act makes it mandatory for credit card payment processing companies to maintain data confidentiality while storing, processing, and exchanging credit card data. Likewise, the United States Health Insurance Portability and Accountability Act (“HIPAA”) mandates maintaining privacy of individually identifiable health data.
Still, enterprises need applications that utilize sensitive data to function and these applications require maintenance and testing. Application development and testing activities need realistic data for validations. To provide realistic data to a testing environment an enterprise may implement a data masking technique to mask actual data. Data masking may also be referred to as data obfuscation, data de-identification, data depersonalization, data scrubbing, data anonymization, data scrambling, and similar terms. Data masking modifies sensitive data to create life-like but false values. Data masking systems generally retain the look and feel of data to enable realistic testing.
To mask a dataset containing sensitive data, enterprises have employed in-situ data masking architectures. In these conventional architectures, a clone of a dataset is made and then a set of masking rules is applied to the cloned dataset, thereby producing a masked dataset. Developers may then be granted access to the masked dataset or the masked dataset may be altogether delivered to developers for testing. This conventional method, however, requires the enterprise itself to convert a large dataset and the dataset may become obsolete as the data in a production environment tends to change quickly. Additionally, because various developers may require different levels of masking (e.g., for internal development, only personal information such as credit card numbers may be masked to ensure employees cannot access them, but for external development pricing and sales information may additionally be masked to prevent the information from leaking to competitors), an enterprise may need to create several distinct masked datasets.
More modern systems may employ a data masking architecture having masking rules as part of a process of moving data from a source database to a target database. This may be implemented in a cloning process in similar fashion to in-situ architectures to prevent an unmasked clone of the dataset from even temporarily existing. However, such on-the-fly data masking architectures are implemented at the data access layer and, thus, are specific to the technology they are implemented on. An on-the-fly architecture must be custom built for a production environment, which is costly and time consuming. Additionally, such conventional architectures cannot operate cross-platform, such as from one database architecture to another.
Enterprises also employ data masking for outsourcing. For example, an enterprise may desire to give an outside enterprise access to a dataset but with sensitive data masked. Conventional methods would require a cloned masked dataset to be created. This may be cumbersome, though, because additional processes would be required to then integrate changes made by the outside enterprise to the cloned masked dataset back into the enterprise's production dataset. Further, additional systems would be required to ensure that both the cloned masked dataset and the production dataset to not get off sync as changes are made on the production side.
Further, sensitive data resides in emails, application files, and other data sources in addition to structured datasets. Sensitive data may be scattered throughout organizational computing devices and stored in varying types of files. Conventional masking solutions may assist with masking organized datasets but fail to provide data security for data outside of the datasets.
Conventional methods require expensive duplication of large datasets according to cumbersome systems. These methods require independent masked datasets to be created depending on the context for which the dataset will be shared. Moreover, these methods are specific to the technology they are implemented on and generally require expensive, custom made, architectures. Improved anonymization methods and systems are desired.
While systems and methods are described herein by way of example and embodiments, those skilled in the art recognize that systems and methods for runtime data anonymization are not limited to the embodiments or drawings described. It should be understood that the drawings and description are not intended to be limiting to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the appended claims. Any headings used herein are for organizational purposes only and are not meant to limit the scope of the description or the claims. As used herein, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to.