This specification relates to static analysis of software source code.
Static analysis refers to techniques for analyzing computer software source code without executing the source code as a computer software program. Static analysis systems analyze source code to determine various properties about source code in a code base and properties of developers who commit code to the code base.
Source code is typically maintained by developers in a code base of source code using a version control system. Version control systems generally maintain multiple revisions of the source code in the code base, each revision being referred to as a commit or a snapshot. Each snapshot includes the source code of files of the code base as the files existed at a particular point in time.
Relationships among snapshots stored in a version control system can be represented as a directed, acyclic revision graph. Each node in the revision graph represents a commit of some portion of the source code of the code base. Each commit identifies source code of a particular snapshot as well as other pertinent information about the snapshot, such as the author of the snapshot and data about ancestors of the commit in the revision graph. A directed edge from a first node to a second node in the revision graph indicates that a commit represented by the first node occurred before a commit represented by the second node, and that no intervening commits exist in the version control system.
A static analysis system can analyze source code of a particular snapshot of the code base to identify characteristic segments of source code in the snapshot. For example, a static analysis system can identify violations in the source code of a particular set of coding standards. A static analysis system can also identify a responsible contributor for each characteristic segment of source code and attribute the characteristic segment to the responsible contributor, e.g., to a particular developer or group of developers.
A static analysis system can rank developers according to a number of lines of code added, deleted, or changed, which will be referred to as the churn or number of lines of churn, by each particular developer. Churn is a rough proxy for productivity for a developer as it represents how many lines of code a developer has changed in the code base.
Churn can be computed for a developer between any arbitrary pair of snapshots in a revision graph. However, typically churn typically is computed between adjacent snapshots in the revision graph that were both committed by a same developer.