In the era of big data, the commercial value contained in data has been fully explored, and the accurate positioning of users has been realized on the basis of data investigating, providing users with more targeted marketing recommendation strategies and service strategies. However, it also brings a huge challenge on the protection of user privacy—how to protect users' privacy while conducting data investigating will become the most serious issue in sensitive data masking. Sensitive data masking refers to a process of data deforming for certain sensitive information according to the sensitive data masking rules, realizing a reliable protection of sensitive private data, and the safe use of the real data set, when in an environment of development, testing and other non-production and outsourcing environments.
At present, the sensitive data masking for text information is mainly to recognize the text contents through natural language processing, and then to perform sensitive data masking for users' private information, such as names, IDs, cell-phone numbers and bank account numbers, on the basis of the recognized text contents. However, the current method of sensitive data masking can only process the texts when processing images, in which, it, at first, recognizes texts in images, and then it identifies sensitive information from the recognized texts, and finally sensitive data is masked for the sensitive information contained in the images. However, for those images with few texts or containing some texts difficult to be recognized, it is hard to mask sensitive data based on image processing, resulting in a greater defect in the sensitive data masking.