Due to rapid growth in the field of information technology in last two decades, various organizations are under obligation, owing to business as well as regulatory needs, to store huge volumes of data. Organizations required to store such huge volumes of data include organizations in fields such as banking, insurance, defense, health care etc. Generally, the practice of organizations is to store data in a server database.
As a result of plethora of information available on the Internet and multiple tools available to access information remotely, data and information security is of prime concern to businesses that store sensitive data such as customer's personal information. Examples of personal information include information relating to customer's banking transactions, personal information of a customer such as name, address, customer profile data, medical history of patients stored in a hospital database, classified information in defense etc. Such information if accessed by an unauthorized user can be extremely damaging to financial as well as personal security of a customer. Hence, it is necessary for an organization to preserve sensitive information from unauthorized users. Even in the event of unauthorized access of sensitive information, an organization should endeavor not to reveal any customer information.
Customer information can be secured by masking data so that sensitive information about a customer is not disclosed. Various methods for data masking include, but are not limited to, nulling out, character masking, substitution, shuffling, number variance, gibberish generation and encryption. The aforesaid methods suffer from various limitations. For example, nulling out replaces all records with null values; hence it is difficult to design test cases resulting in poor approximated data for computation. Character masking is complex if the data entry has special cases and may leave some data entries inappropriately masked. Further, substitution requires lot of stored data sets and preliminary efforts and shuffling requires an effective randomized shuffling algorithm which is ineffective in processing small data sets. Limitations of other methods include, number variance technique can be applied only on entries with numbers, gibberish generation employs a substitution method which is complex, whereas security of encrypted data is dependent on strength of encryption used.
In the light of the above, there exists a need to devise a method of masking numerical as well as alphabetical data. Further, masked data should minimize the possibility of having same entry and close entries even after random shuffling.