1. Field of the Invention
This invention relates to information security, and particularly to a system, method, and computer program product for implementing search- and retrieval-compatible data obfuscation.
2. Description of Background
Obfuscation is a widely used method for making extraction of information more difficult. In one application, obfuscation is used to make it difficult to reverse engineer programs (see, e.g., Sunil Gupta, “Code Obfuscation—Part 2: Obfuscating Data Structures,” http://palisade.plynt.com/issues/2005Sep/code-obfuscation-continued/). In another application, obfuscation is used to mask data to make it less recognizable (see, e.g., “The Data Masker: Data Obfuscation Made Simple,” http://www.datamasker.com/dm_sitemap.htm).
Obfuscation is also widely used in security and privacy where data obfuscation addresses this dilemma by extending several existing technologies and defining obfuscation properties that quantify the technologies' usefulness and privacy preservation (see, e.g., David Bakken, Rupa Parameswaren, Douglas Blough, Andy Franz and Ty Plamer, “Data Obfuscation: Anonymity and Desensitization of Usable Data Sets” http://doi.ieeecomputersociety.org/10.1109/MSP.2004.97; and R. Agrawal and S. Ramakrishnan, “Privacy-Preserving Data Mining,” Proc. ACM SIGMOD Conf. on Management of Data, ACM Press, 2000, pp. 439-450).
Obfuscation is also used to hide actual IP addresses from being recognized by malice web sites and not allowing the sites to collect personal data from the PC every time it is visited by a user (see, e.g., NetConceal—software to hide your ip address: http://www.netconceal.com/).
One way to access documents/information in an electronic format is to create an index of some sort (e.g., using keywords or topics, etc.) and then providing a search capability, incorporating the index, in such a way that a desired document or portion thereof may be located quickly. This is described in U.S. Pat. No. 6,654,754 (issued November 2003), entitled “System and method of dynamically generating an electronic document based upon data analysis.”
Existing methods of obfuscation (see, e.g., U.S. Pat. No. 6,981,217 (issued December 2005), entitled “System and method of obfuscating data”) are incompatible with regular indexing, such as the indexing methods described in U.S. Pat. No. 6,654,754, and create significant difficulty for locating information in the obfuscated documents.
What is needed, therefore, is a way to obfuscate information or documents that enables the creation of indices, as well as a way to obfuscate not only content, but also metadata and the structure and relationships between artifacts.