1. Field of the Invention
The present invention relates generally to the field of data obfuscation, and more particularly, to a system and method for obfuscation of data across an enterprise.
2. Description of the Related Art
During 2006, the cost of a data breach in an enterprise ranged from approximately $200 to $22 million U.S. dollars per year at an average of $182 per customer record and $4.8 million per incident. The total cost of some 93 million compromised records was in the billions of dollars [1]. Based on this report and others like it, and in light of legislative efforts to address data breaches and related privacy issues at both the state and federal levels, it is evident that the protection of data containing private information has become both a legislative and a business priority. As a consequence, and for reasons relating to accountability, it has become necessary not only to obfuscate data on an enterprise level but also to have the capability to trace actions taken to protect sensitive data.
Numerous methods currently exist for the obfuscation of data. As used herein, the term “data obfuscation” means to conceal or change the underlying data and/or the relationships between data so that the original meaning of the data is not revealed. The typical purpose or rationale for obfuscation is to protect sensitive or private data when that data is shared either between organizations (for example, for analytical purposes) or between individuals within an organization with different levels of security. These methods include, among other methods, encryption, data masking, de-identification, data scrambling, and replacing data items with a constant value. These terms are often not used consistently, and their definitions may overlap. The term “encryption” generally refers to the process of using an algorithm to alter data so that it is unintelligible to unauthorized parties and requires a significant expenditure of resources to return the data to its original form without knowledge of the algorithm. The term “data masking” is sometimes used synonymously with “data obfuscation,” but technically it refers to using a pattern of characters, bits, or bytes to control the elimination or retention of another pattern of characters, bits, or bytes. The term “de-identification” generally refers to using an algorithm to replace a value with another value taken from a particular domain of values wherein this target domain sufficiently matches the domain for the original value. The term “data scrambling” generally refers to altering information in such a way that it is not intelligible (with or without the same algorithm). Replacing data items with a constant value obliterates an original value or values; for example, a field may be simply erased or filled with X's or asterisks.
The present invention is not a new form of data obfuscation. Rather, the present invention allows these and other data obfuscation methods to be applied appropriately in an automated manner across an enterprise. The challenges associated with obfuscating data across an enterprise, and those addressed by the present invention, include: (i) determining and finding the information that needs to be obfuscated; (ii) determining the appropriate method for obfuscating the data; (iii) assuring that the method for obfuscating the data conforms to the needs of the applications that use this data; (iv) determining a strategy for obfuscating large collections of data that are distributed (for example, geographically or across different systems or technologies) across the enterprise; (v) federating the data across an enterprise so that there is a common understanding as to what that data represents; (vi) providing procedural instructions and property specifications to a system for obfuscation that are easy to express and reliable in their execution; and (vii) providing a means to test and validate obfuscation operations on the enterprise. There is also a need, addressed by the present invention, to account for how the data obfuscation was accomplished once it has been done, including providing change histories and information on the sources of such changes.
Federal, state and local regulatory demands, in addition to organizational directives, have created very stringent and difficult requirements for organizations that handle sensitive data. Industry response so far has generally been to encrypt all data collections that may contain sensitive information, to encrypt those data elements that contain sensitive information, to exchange sensitive data with non-sensitive data, or to do nothing. When steps are taken to obfuscate data in an enterprise, those efforts have typically focused on simple collections of data involving a discrete number of data sets rather than focusing on the enterprise as a whole. This piece-meal approach results in the data obfuscation activity not being sufficiently comprehensive.
Accordingly, it is an object of the present invention to provide a means for obfuscating data across an enterprise that determines and finds the information that needs to be obfuscated, determines the appropriate method for obfuscating the data, assures that the method for obfuscating that data conforms to the needs of the applications that use this data, determines a strategy for obfuscating large collections of data that are distributed across an enterprise, federates the data across an enterprise so that there is a common understanding as to what the data represents, provides procedural instructions and property specifications to a system for obfuscation that are easy to express and reliable in their execution; and provides a means to test and validate obfuscation operations on the enterprise.
It is a further object of the present invention to provide a means for assuring that actions taken to protect the data are both recorded and traceable. In that these recorded actions may quickly become voluminous and often need to be cross-referenced, it is yet another object of the present invention to ensure that the records relating to actions taken to obfuscate data are in a form that can be readily manipulated and analyzed by computer. In this respect, it is an object of the present invention to maintain the recorded data as formally expressed elements of a database that is compatible with a wide variety of analytical techniques.