1. Field of the Invention
This invention pertains to searches of sensitive data, and reporting of results of such searches.
2. Description of Background
Searches conducted of collections of data for sensitive information can expose the sensitive information to unintended parties. For example, search engine software can index, catalog, and store or cache any publicly visible data that can be found on the Internet. In the process of performing such searches, search engines may index and cache sensitive data that is exposed inadvertently as a result of poorly designed web sites, or intentionally as a means to disseminate private information to other malicious users. Therefore, the search engine can unwittingly become a potential tool for malicious users who devise ways to use an otherwise innocuous search string to mine for the sensitive data of others. An example of such sensitive data may be described as patterns of sixteen-digit sequences starting with a known four-digit prefix, such as may relate to credit cards issued by a particular financial institution.
Institutions and individuals who wish to determine whether or not their sensitive data has been exposed could attempt to discover such using the search engine's normal facilities. However, in doing so the institution or individual will ultimately expose the sensitive data to the search engine and possibly further if the transmission is intercepted. For example, search engine sites often provide “search history data” that disseminates prior search topics in ways that could expose such data.
Current solutions include blocking of searches that include particular patterns that may be related to such sensitive information. Such blocks, however, do not serve to apprise institutions and individuals of the exposure of sensitive information, nor do the blocks serve to identify or notify the web sites including error-prone code that are an inadvertent source of the sensitive information.
Other solutions can include provision of a pattern or algorithm-based search, allowing the institution or individual to perform a broad search that includes only a small, innocuous portion of the sensitive data (removed from its broader context). Although this approach avoids exposing data unnecessarily to the search engine, the results of such searches may include the sensitive data of other individuals or institutions as well as that of the searcher. As such, the pattern-based search adds new opportunity for a malicious user to exploit the existing search engine facilities and mine for data that could be used for improper purposes.
Further solutions can include automatic removal requests for exposed data that matches such patterns/algorithms. However, any such data has already been exposed. Removal of the data may be of little value if a malicious user has already found it before removal. Accordingly, there is a need in the art for a data search reporting arrangement that overcomes these drawbacks.