This specification relates to static analysis of source code.
Static analysis refers to techniques for analyzing computer software source code without executing the source code as a computer software program.
Source code is typically maintained by developers in a code base of source code using a version control system. Version control systems generally maintain multiple revisions of the source code in the code base, each revision being referred to as a snapshot. Each snapshot includes the source code of files of the code base as the files existed at a particular point in time.
A static analysis system can analyze the source code in a snapshot and generate static analysis results. The static analysis results can be stored in static analysis results files. The static analysis results can include characteristic segments of extracted source code identified by the static analysis system. A characteristic segment of source code is a segment of source code having a particular attribute. Static analysis results can include data specifying where, in the project, the characteristic segments of source code occur.
An example of characteristic segments of source code that a static analysis system may generate is source code coding defects. Coding defects are segments of source code that violate one or more coding standards. Data representing such coding defects may be referred to as violations. Thus, a violation can identify a location in a source code file of a coding defect, a type of the coding defect, and the segment of source code that causes the coding defect. For example, a segment of source code that compares variables of incomparable types is a coding defect, which can be represented by a corresponding violation that identifies the location of the source code, the source code itself, and a type of “comparison between variables of incomparable types.”
A static analysis system can store source code snapshots and static analysis results files in a version control repository, which can be a content-addressable storage (CAS) system. A CAS system generates file identifiers that are based on the content of the file. Thus, if two static analysis results files are the same, a CAS system will store only one version of the two files. However, if two static analysis results files are different, a CAS system will need to store information representing the difference or both copies in their entirety.