This specification relates to static analysis of computer software source code.
Static analysis refers to techniques for analyzing computer software source code without executing the source code as a computer software program.
Source code in a code base is typically maintained by developers using a version control system. Version control systems generally maintain multiple revisions of the source code in the code base, each revision being referred to as a snapshot. Each snapshot includes the source code of files of the code base as the files existed at a particular point in time, or data from which those source code files can be reconstructed.
Relationships among snapshots of the source code base can be represented as a directed acyclic revision graph. Each node in the revision graph represents a commit of the source code. A commit represents a snapshot as well as information about ancestor snapshots of the node in the revision graph. A directed edge from a first node to a second node in the revision graph indicates that a snapshot of the commit represented by the first node is a previous snapshot of a snapshot of the commit represented by the second node.
Identifying characteristic segments of source code is a task that will be referred to as analysis. A characteristic segment of source code is a segment of source code having a particular attribute. For example, an analysis task can identify source code segments that include violations of a particular coding standard, e.g., a segment of source code that compares variables of different types. Analysis tasks may build source code of a particular snapshot, e.g., by compiling source code files and linking resulting object files and libraries. Analysis tasks can then identify characteristic segments of source code by examining relationships between source code constructs in the snapshot, e.g., between variables, functions, and classes.
Identifying a responsible entity for each characteristic segment of source code is a task that will be referred to as attribution. Attribution for a particular snapshot generally includes comparing the characteristic source code segments that occur in the snapshot with characteristic source code segments found in each of one or more parent snapshots. For example, if a violation is absent in a parent snapshot, but occurs in a snapshot that is a child of the parent snapshot according to a revision graph, the violation may be attributed to a developer who committed the child snapshot.