Field of the Invention
The present invention relates to the protection of sensitive data in computer systems. More specifically, the present invention relates to a system and method for self-protecting data.
Related Art
Computer users frequently download applications from unknown sources without feeling certain that the applications do not do anything harmful. In cloud computing, users frequently use third-party applications, like analytics or management programs, to process proprietary or high-value data. If users allow these applications to process confidential or sensitive data, they have to trust that the applications do not intentionally or inadvertently leak their data.
Allowing third-party applications to process sensitive data poses several challenges. First, users typically do not have source code and cannot modify the application program. They only know the application's advertised functions, but have no idea what the program actually does. Users can only execute the program binaries. Second, for a user-recipient who is authorized to access the sensitive data using the application in question, how can the sender ensure that the recipient does not then transmit the data, perhaps transformed or obfuscated, to unauthorized parties? Third, without expecting that the applications are outright malicious, users must assume that complex software will very likely have some bugs or security vulnerabilities
Attempts have been made to resolve the aforementioned concerns. For example, the BitBlaze project combines static and dynamic analysis for application binaries for various purposes, e.g., spyware analysis and vulnerability discovery. Further, language-based techniques can prevent leaking of information by static type-checking of programs written in languages that can express information flow directly. Programmers can specify the legitimate information flows and policies in the program such that no illegal information flow would be allowed when compiling the program. This static method can be formally verified to be secure. However, it requires access to the source code and re-writing or re-compiling the applications.
Software solutions involving new operating system designs like HiStar and Asbestos proposed labeling of system objects to control information flow. A process (thread) that has accessed protected data is not allowed to send any data to the network, even if the data sent has no relation at all to the protected data. This coarse-grained information flow protection requires the application to be partitioned into components with different levels of privileges.
Other software solutions use binary translation, or compiler-assisted binary re-writing to change the program, for example, to turn implicit information flows into explicit information flows. However, such software-only information flow tracking approaches may be impractical due to prohibitive performance overhead. For example, to deal with tag assignments and bookkeeping, a single data movement instruction becomes eight instructions after binary translation. A single arithmetic/logic or control flow instruction is replaced by 20 instructions after binary translation. Even with parallel execution of the binary translation the performance overhead is around 1.5×. Further, hardware dynamic information flow tracking solutions include Raksha, which can detect both high-level and low-level software vulnerabilities, by programming (i.e., configuring) the Raksha hardware with a small set of four security policies at a time. Thus, only these four vulnerabilities can be detected.
GLIFT is another hardware dynamic information flow tracking (DIFT) solution that tracks information flow at a much lower hardware level—the gate level. It uses a predicated architecture (implying re-writing or re-compiling applications) which executes all paths of a program to track both explicit and implicit information flow, but at a much higher cost. While a very interesting and potentially promising approach, all the hardware has to be re-designed from the gates up, requiring unproven new hardware design methodologies and tools. Furthermore, the GLIFT protection cannot support chip crossings and machine crossings in a distributed computing environment.
These hardware DIFT solutions either support only a few fixed policies for detecting specific vulnerabilities, or require modifying the software. Whenever hardware is used for policy enforcement, there is a semantic gap between the flexibility of policy specification required at the user and domain level, and the restricted actions that can be supported by hardware.
Suh et al. proposed the architectural support for DIFT to track I/O inputs and monitor their use for integrity protection. They assume that the programs can be buggy and contain vulnerabilities, but they are not malicious, and the OS manages the protection and is thus trusted. One bit is used as the security tag that indicates whether the corresponding data block is authentic or potentially suspicious.
Information flow tracking can be achieved either statically before the program is run, dynamically when the program is running, or both. In addition, the tracking can be done at the software level or at the hardware level. The granularity of the tracking can also be varied depending on the application, e.g., at the lowest gate level or at a higher operating system objects level.
Static language-based software techniques track information by type-checking programs that are written in languages that express information flow directly. Programmers can specify the legitimate information flows and policies in the program such that no illegal information flow would be allowed once the program is compiled. This static method can be formally verified to be secure and can address most of the implicit information flow and even some side channels since the high-level semantics of the program can be checked. However, it requires access to the source code, requires re-writing or re-compiling the applications and makes the programmer responsible for specifying the information flow policy.
To be able to track implicit information flow while incurring minimal performance overhead, a hybrid approach that combines static analysis with DIFT is desirable. RIFLE is a hybrid approach that uses compiler-assisted binary re-writing to change the program to turn implicit information flows due to condition flags into explicit tag assignments. Once the implicit flows are turned explicit, the hardware can track these explicit information flows efficiently. The BitBlaze project also combines static binary analysis and dynamic tracking for application binaries for various purposes, e.g., spyware analysis and vulnerability discovery. Note that this hybrid approach is not to be confused with combining software information flow tracking with DIFT, as the static analysis does not track the information flow but merely assists the DIFT scheme to provide information which may be missed for a pure DIFT scheme.
Cavallaro et al. discussed several practical examples that can be used by malware to evade dynamic taint analysis. For example, control dependence, pointer indirection, implicit flows and timing-based attacks are described.