1. Field of the Invention
The present invention relates to a technique for supporting a processing of a character string in a document. More particularly, the invention relates to a technique for helping a user to determine the content of a processing of the character string in the document by providing reference information.
2. Description of the Related Art
Recently, call logs of a customer support center or the like have been utilized for planning a business strategy, and project documents created in the past have been shared for reuse to enhance the efficiency or the quality. Such call logs or documents, however, include confidential information or personal information such as an organization name or a product name in many cases. Before the call logs or the documents are used, the confidential information or the personal information needs to be masked in advance.
The “black-list” masking technique is a conventionally known technique which uses a dictionary that includes risky words to be masked, matches words with the risky words, and replaces the matched words with turned letters (for example, see Japanese Application Publication No. 2002-259368 and Japanese Application Publication No. 2004-227141). However, since the “black-list” masking technique has a drawback in that masked words are limited to ones detected on a dictionary or rule basis, some of the candidates to be masked might not be detected. Thus, a detection error needs to be corrected manually. This causes problems in that masking is not efficiently performed in a short time and that there is a risk of leaking confidential or personal information if a risky word is overlooked by human error.
Another masking is the “white-list” masking technique. The technique performs masking by first masking all words, and then unmasks any word determined to be safe based on a safe word list enumerating safe words common to all the documents. In the “white-list” masking technique, even though a safe word that would normally be unmasked might remain in a masked state when the word is not included in the list, confidential or personal information does not leak. A full-fledged safe word list improves readability. However, because a safe word list is created manually, creating such a list is time-consuming.
Japanese Application Publication No. 2007-172404 and Yohei Ikawa, Daisuke Takuma, and Hiroshi Kanayama, “A Masking System for Confidential Documents by Unmasking Safe Words,” Technical Report of the Institute of Electronics, Information and Communication Engineers, DE, Data Engineering, Vol. 106, pp. 79-84, Jul. 7, 2006 disclose techniques in which, for creation of a safe word list common to all the documents, the importance of a word is calculated based on the frequency of appearance, the character string length, and the likeness of the word; and words are presented in descending order in the importance to prompt a user to determine whether or not to unmask each of the words. Yohei Ikawa, Daisuke Takuma, and Hiroshi Kanayama, “A Masking System for Confidential Documents by Unmasking Safe Words,” Technical Report of the Institute of Electronics, Information and Communication Engineers, DE, Data Engineering, Vol. 106, pp. 79-84, Jul. 7, 2006 shows an experimental example in which, by processing words in descending order in the importance in the manner as mentioned above, 90% readability was achieved through checking of 35% of the total words.
Accordingly, the above techniques enable efficient creation of a full-fledged safe word list common to all the documents manually. However, the aforementioned experimental example shown in Yohei Ikawa, Daisuke Takuma, and Hiroshi Kanayama, “A Masking System for Confidential Documents by Unmasking Safe Words,” Technical Report of the Institute of Electronics, Information and Communication Engineers, DE, Data Engineering, Vol. 106, pp. 79-84, Jul. 7, 2006 shows that 65% of the rest of the words need to be checked in order to increase the readability from 90% to 100%. Also from a cost-effectiveness viewpoint, it is not appropriate that a limited number of staffs such as a creator or an administrator of a document take charge of checking the 65% of the rest of words for increasing the readability.