Computer software applications, such as web browsers and word processors, receive and manipulate various types of user data including certain types of sensitive or private user data that is not meant to be shared. Examples of such sensitive data include credit card numbers, account number and confidential documents. Although the user may expect the applications to maintain the privacy of this data, many applications have leaks that allow the sensitive data to escape without user authorization or knowledge. In some cases, malicious applications (i.e., malware) intentionally seek to obtain sensitive data from other software applications. In other cases, applications merely allow data to escape as part of the normal operation. Network applications, for example, may disclose various types of personal information (e.g., search terms, user terms, system configuration) to their publishers and/or to third parties. Other applications leak information via temporary copies or cached file snapshots. The user generally does not know exactly what sensitive data these applications have leaked.
Existing tools, such as file encryption and firewalls, provide limited protection for such sensitive data when using a network. Such tools may fail to provide protection, however, once an application is authorized to read the user data and has access to output channels. Firewalls in particular may not protect leaks to a filesystem and may not block leaks to already established connections. Other existing tools for determining when sensitive data has been leaked inspect network traffic or file content and search for possible copies of sensitive data. Because these tools rely on pattern matching, they are prone to errors, for example, when the sensitive data is encrypted and then leaked. Moreover, the detection often happens after the leaks occur (e.g., leaked documents are being transmitted on the network or copied into other files).
Attempts have also been made to track sensitive data as it flows or propagates through a system, but such attempts have lacked accuracy and efficiency. Hardware-level data tracking, for example, incurs significant performance and analysis overhead, which makes it unsuitable for inspecting interactive network applications. Other data tracking tools have been built directly into the operating system using system call interposition but have been unable to track data accurately when the data is transformed by the application without using the monitored system calls.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art. Accordingly, it is intended that the claimed subject matter be viewed broadly.