This specification relates to static analysis of computer software source code. Static analysis refers to techniques for analyzing computer software source code without executing the source code as a computer software program.
Source code is typically maintained by developers in a code base of source code using a version control system. Version control systems generally maintain multiple revisions of the source code in the code base, each revision being referred to as a snapshot. Each snapshot includes the source code of files of the code base as files existed at a particular point in time.
Snapshots stored in a version control system can be represented as a directed, acyclic revision graph. Each node in the revision graph represents a commit of the source code. A commit represents a snapshot as well as other pertinent information about the snapshot such as the author of the snapshot, and the data about ancestor commits of the node in the revision graph. A directed edge from a first node to a second node in the revision graph indicates that a commit represented by the first node is a commit preceding a commit represented by the second node, and that no intervening commits exist in the version control system.
Branching is the process of making a copy of a snapshot of the code base that is developed independently. Thus, subsequent modifications on the new branch do not affect later commits on the previous branch. Merging is the process of incorporating two branches into a single branch. Branching and merging processes allow parallel development to occur along multiple versions of the code base. The developed features can then be merged back together at a later time. Developers working in parallel on different branches can create new features in the branches. Branches that are used to create such new features may thus be referred to as feature branches.
Aspects of static analysis include attributing source code contributions and generating data representing trends in code bases. Attributing source code contributions means attributing changes introduced by a snapshot to a particular developer entity responsible for committing the snapshot. A developer entity can be a single developer or a group of multiple developers. For example, a developer entity can be a lone developer, developers on a team, developers within an organization or within a department of an organization, or any other appropriate group of developers.
Accurately attributing source code contributions can be difficult for real-world code bases that have multiple branches. As one example, consider three branches having three instances of the same violation. If a developer entity implements the same fix in all three branches and then merges the branches, the developer entity may be credited with fixing three problems in the code base, even though the developer entity only fixed one problem.