With the advent of online data centric activities and growth in electronic business transactions, organizations are facing constant pressure to maintain a growing volume of sensitive and non-sensitive data. The ease at which data can be collected automatically, stored in databases, and queried efficiently over the internet (or otherwise) has paradoxically worsened the privacy situation and has raised numerous ethical and legal concerns, such as, private data falling into malicious hands, theft, stalking on the web, spam, etc.
Numerous ‘data privacy’ research indicates that more than half of security breaches comes from within the organization and are fifty times costly when compared with external breach. A worldwide awareness and concern towards data privacy legislation has put pressure on organizations to improve their data privacy and security standards. Thus, there is a need to provide technological solutions to achieve privacy keeping a tradeoff between data privacy and data utility. Several techniques including data masking is employed to achieve data privacy and data usability. Data masking can be defined as a process whereby the information in a database is masked or ‘de-identified’.
Data masking is a process of masking pre-determined data within a database to ensure data security of private and confidential data. Data masking is usually carried out to avoid dissemination of sensitive information to non-authorized persons. It enables the creation of realistic data in non-production environments without the risk of exposing sensitive information to unauthorized users. Data masking ensures protection of sensitive information from a multitude of threats posed both outside and inside the organization's boundary. Several techniques have been used for data masking, like, anonymization, randomization, perturbation, cryptographic approach, Privacy Policy Languages and Data Masking.
While data masking is one of the known method of protecting data, there is still a need to have a data masker which can mask the data in such a fashion, that the masked data acts like real data for all practical purposes. Further, data maskers are also not able to maintain the data relationships between rows, columns, and various tables and once the masking is over, relationships within the databases are lost. Furthermore, the efficiency of existing data maskers to reproduce data once the data has been masked is abysmally low. Moreover, the existing data maskers are not able to fit in with the existing legacy databases resulting in compatibility issues. In addition, the ease of use and intelligent masking of data is a constant concern with the conventional data maskers. Also, the conventional data maskers employ encryption or substitution or shuffling or data-number variance or other techniques which either lack in look-and-feel of real data or are slower in speed or erroneous in data masking or are dependent on type of database and language.
In light of the abovementioned disadvantages, there is need for a data masker which can provide similar look-and-feel to masked data as of real data. Further, there is need to provide data masking at much faster rate with minimum error. In addition, there is a need for a data masker which is independent of the type of database and languages.