This specification relates to static analysis of computer software source code.
Static analysis refers to techniques for analyzing computer software source code without executing the source code as a computer software program.
Source code is typically maintained by developers in a code base of source code using a version control system. Version control systems generally maintain multiple revisions of the source code in the code base, each revision being referred to as a snapshot. Each snapshot is a representation of the source code of the code base as the source code existed at a particular point in time. A snapshot may be thought of as including all the source code as of a particular point in time, although all the source code need not be explicitly stored for every snapshot.
Snapshots stored in a version control system can be represented as a directed, acyclical commit graph. Each node in the commit graph represents a commit of the source code. A commit represents a snapshot as well as other pertinent information about the snapshot such as the author of the snapshot, and data about ancestor commits of the node in the commit graph. A directed edge from a first node to a second node in the commit graph indicates that a commit represented by the first node is a previous commit than a commit represented by the second node, and that no intervening commits exist in the version control system.
Branching is the process of making a copy of a source snapshot of the code base that is subsequently developed independently of the source snapshot. Thus, subsequent modifications on the new branch do not affect later commits on the previous branch. Merging is the process of incorporating two branches into a single branch. Branching and merging processes allow parallel development to occur along multiple versions of the code base. The developed features can then be merged back together at a later time. Developers working in parallel on different branches can create new features in the branches.
Aspects of static analysis include attributing source code contributions using a commit graph. Attributing source code contributions means attributing, to a developer entity responsible for a snapshot, changes introduced by the snapshot relative to a previous snapshot. A developer entity can be a single developer or a group of multiple developers. For example, a developer entity can be a lone developer, developers on a team, developers within a department of an organization, or any other appropriate group of developers.
Some version control systems use a directory-based branching system rather than a graph-based branching system. One such example is Apache Subversion. In a directory-based branching system, no revision graph is explicitly maintained. Rather, each branch is identified by a branch path, e.g., a path in a file system to a branch directory, and each revision is identified by a branch path and a revision number. Typically, the revision numbers across all revisions in a project are updated incrementally.
New branches can be created by creating a copy of a working directory for a snapshot. Often the new copy will have a name that conforms to a particular naming convention enforced by an organization. Thus, given a particular naming convention, some prior art software tools can generate a commit graph for a project that is maintained by a directory-based version control system.
However, such naming practices are merely organization-enforced conventions, and thus, many exceptions and unusual circumstances abound in practice. For example, many software projects often go through naming convention changes throughout their lifetimes. This can happen, for example, due to a change in organization practices, a change in management or ownership, a change in industry practice, or a change to the underlying version control system. Therefore, in any sufficiently large or sufficiently old software project, a naming convention change at some point in its history is likely to have occurred. In addition, enforcement can simply be haphazard or nonexistent. Thus, not all software projects adhere to such naming conventions even when they are present.
Therefore, in general it is not possible with prior art software tools to automatically construct a commit graph for a project maintained in a directory-based version control system. Rather, the commit graph must be constructed piecemeal by manually and painstakingly deciphering all the old naming conventions, or lack thereof, used in the software project.
However, these kinds of manual processes to construct a commit graph are unsuitable for a static analysis system that seeks to automatically analyze thousands or tens of thousands of software projects.