This invention relates generally to analysis of software programs such as object code, byte code, source code, executable code, and libraries, and, more specifically, relates to static analysis of software programs.
Many software programs are divided into two parts, an application portion and a library portion. The library portion is typically written in a generic form to enable interfacing with many different application portions. The software program is created by a developer, and the developer generally only has control over the application portion of the software program.
Although the developer only has control over the application portion of the program, the developer or another user can still be interested in security risks created by the application portion and its interaction with the library portion. For instance, in a taint analysis of a software program, information paths are tracked from untrusted methods and parameters (called “sources” herein) in the application portion into security-sensitive areas (called “sinks” herein) in the library portion. Such information paths are computed by tracking data flows through the program. Each node in an information path is typically a program statement, and each edge represents the presence of flow of data between statements. Optionally, control flows can be part of this computation as well, thereby an edge in an information path. These paths can be analyzed to determine if downgrading actions (such as endorsers and declassifiers) can be used in the information paths to increase security.
One way to perform this analysis is via static analysis of the software program. A static analysis evaluates the program statically: that is, the program is not executed during this analysis. Certain models (such as call graphs and points-to graphs) may be created from the software program, based on a line-by-line interpretation of the program. Such models may be analyzed during the static analysis to determine information about the software program, such as the information paths described above.
One of the problems with a static analysis of information path is that the analysis generates a large report. This is true because each path from a source to a sink is typically reported and even moderately sized programs have many such paths.