This specification relates to static analysis of source code.
Source code is typically maintained by developers in a code base of source code using a version control system. Version control systems generally maintain multiple revisions of the source code in the code base, each revision being referred to as a commit or a snapshot. Each snapshot includes the source code files of the code base as the files existed at a particular point in time. A snapshot may include additional data, e.g., about the snapshot or used for analysis of snapshot, in addition to the source code files.
Static analysis refers to techniques for analyzing computer software source code without executing the source code as a computer software program. A static analysis system can analyze the source code in a snapshot and generate static analysis results. The static analysis results can be stored in static analysis results files. The static analysis results can include characteristic segments of extracted source code identified by the static analysis system. A characteristic segment of source code is a segment of source code having a particular attribute. Static analysis results can include data specifying where, in the project, the characteristic segments of source code occur.
An example of characteristic segments of source code that a static analysis system may generate is source code coding defects. Coding defects are segments of source code that violate one or more coding standards. Data representing such coding defects may be referred to as violations. Thus, a violation can identify a location in a source code file of a coding defect, a type of the coding defect, and the segment of source code that causes the coding defect. For example, a segment of source code that compares variables of incomparable types is a coding defect, which can be represented by a corresponding violation that identifies the location of the source code, and a type of “comparison between variables of incomparable types.” A violation may identify the source code for which the coding defect was determined or may reference the source code using the location of the source code, e.g., in a file, which can be used to look up the source code for later analysis.
A static analysis system can store source code snapshots and static analysis results files in a version control repository, which can be a content-addressable storage (CAS) system. A CAS system generates file identifiers that are based on the content of the file. Thus, if two static analysis results files are the same, a CAS system will store only one version of the two files. However, if two static analysis results files are different, a CAS system will need to store information representing the difference or both copies in their entirety.