This specification relates to static analysis of computer software source code.
Static analysis refers to techniques for analyzing computer software source code without executing the source code as a computer software program.
Source code is typically maintained by developers in a code base of source code using a version control system. Version control systems generally maintain multiple revisions of the source code in the code base, each revision being referred to as a snapshot. Each snapshot includes the source code of files of the code base as the files existed at a particular point in time.
Relationships among snapshots stored in a version control system can be represented as a directed, acyclic revision graph. Each node in the revision graph represents a commit of some portion of the source code of the code base. Each commit identifies source code of a particular snapshot as well as other pertinent information about the snapshot, such as the author of the snapshot and data about ancestors of the commit in the revision graph. A directed edge from a first node to a second node in the revision graph indicates that a commit represented by the first node occurred before a commit represented by the second node, and that no intervening commits exist in the version control system.
A static analysis system can compile source code of a particular snapshot of the code base to identify characteristic segments of source code in the snapshot. For example, a static analysis system can identify violations in the source code of a particular set of coding standards. A static analysis system can also identify a responsible contributor for each characteristic segment of source code and attribute the characteristic segment to the responsible contributor, e.g., to a particular developer or group of developers.
A static analysis system can rank developers according to violation counts. For example, the system can keep track of how many violations each developer introduces into the code base and how many violations each developer removes from the code base.
Violation counts are influenced significantly by the number of lines of code added, deleted, or changed, also referred to as the churn or the number of lines of churn, by each particular developer. Thus, a developer with high churn is likely to have removed more violations than a developer who is new to the team.
A static analysis system can also rank developers according to a violation density score. The violation density score is generally computed as a net of violation introductions n and violation removals f attributed to the developer, divided by the churn c attributed to the developer. Thus, the violation density score d can be computed according to:
  d  =                    n        -        f            c        .  
Violation density scores are also influenced significantly by the number of lines of churn a particular developer has contributed. For example, a developer who has 10 net violations in 100 lines of churn may have a relatively high violation density. However, this score may not be a meaningful indicator of the developer's effectiveness until more data establishes that the violations are an ongoing pattern rather than an aberration due to the low churn or due to a problematic segment of source code.