Software code often contains errors. Some of these errors are easily detected by visual inspection of the printed code. More subtle errors are typically only discovered with the help of software debugging analysis tools.
Taint analysis involves detection of usage within source code of untrusted data from outside of the control of a computer program when executing on a computer system. More specifically, taint analysis involves distinguishing between trusted data created within the address space of an executing program and data that was in some manner copied into this address space from an external, potentially untrusted source. Taint analysis typically involves a form of information-flow analysis of computer program code that establishes whether values from untrusted sources may flow into security-sensitive computer system operations. Taint analysis may involve marking untrusted data arriving from taint sources as being tainted. As data propagates through computer memory and through various operations, that taint marking information is propagated with the data itself. Taint analysis may be employed with either dynamic debugging analysis techniques or static debugging analysis techniques or a combination of both.
Dynamic software analysis tools perform run-time error checking. Software errors may be captured as they occur. For example, if control branches down a particular path in the program, an error (e.g., an out-of-bounds memory access) that occurs along that path may be detected. Although dynamic analysis tools often are invaluable in the debugging process, they are not without shortcomings. In particular, it may be difficult to exercise complex software thoroughly during testing. For example, in particularly large programs, it may be possible to rigorously test only a small percentage of all possible program behaviors before the software is released to end users. Rarely used portions of the software (e.g., rarely-traveled paths in conditional branches) may never be tested before the software is deployed in the field.
Static software analysis tools operate on static code (i.e., code that is not running during the analysis process). Static analysis is performed on computer program code to simulate operation of an actual computer system configured using the code without actually using the code to physically configure the computer system. Static analysis provides an understanding of the code that can ensure that the source code complies with prescribed coding standards, to find unwanted dependencies, and to ensure that the desired structural design of the code is maintained, for example. Static analysis also can detect errors that are easily missed when using dynamic analysis tools alone. For example, static analysis may detect an illegal operation that is contained in a rarely traversed or otherwise hard-to-test conditional branch path. Because the path is so rarely visited during operation of the software, this error might not be detected using a dynamic analysis tool.
In the past, static taint analysis ordinarily has involved identifying taintedness sources and propagating and tracing tainted information interprocedurally and/or globally through paths of execution within a program to determine whether it reaches a taintedness sink. Unfortunately, only incomplete information about the identity of taintedness sources may be available. Moreover, even if such analysis has accurate information about the sources of tainting, it may be difficult to trace the taint through different paths of code operations to a taintedness sink due to overlapping paths, or due to simplifications and abstractions of the memory model that are necessary for practical reasons.