This specification relates to static analysis of computer software source code.
Static analysis refers to techniques for analyzing computer software source code without executing the source code as a computer software program.
Static analysis techniques include techniques for identifying potential problems in or opportunities for improving software projects. In this specification, the term “software project,” or for brevity, a “project,” is a collection of source code files organized in a particular way, e.g., in a hierarchical directory structure, with each source code file in the project having a respective path. Each project has one or more respective owners and one or more source code developers who contribute source code to the project. Typically, the source code files in a project provide one or more related functionalities.
A static analysis system can analyze projects using a collection of static analysis rules, which for brevity can simply be referred to as rules. Each rule is defined to identify characteristic segments of source code. A characteristic segment of source code is a segment of source code having a particular attribute of interest. Each characteristic segment of source code typically identifies a potential problem or an opportunity for improving source code in a particular programming language. Each characteristic segment of source code can thus be referred to as a coding defect. Data elements representing such defects will be referred to as violations.
Typically, the characteristic segments of source code, while not ideal in some respects, are nevertheless valid programming language statements. That is, the static analysis rules can identify characteristic segments of source that do not generate errors or warnings when compiled, linked, interpreted, or executed.
Each static analysis rule specifies one or more attributes for one or more source code elements, one or more relationships between source code elements, or some combination of these. For example, a rule can be defined to identify when a function call has an unexpected number of arguments, e.g., more arguments than a number of arguments that are specified by the definition of the function. A function call having an unexpected number of arguments can be a bug, in which case the rule can indicate that the function call is a bug that should be fixed.
Static analysis rules can identify more than bugs. For example, static analysis rules can identify characteristic segments of source code that present long-term risks to the ability to work with the source code. In these cases, even though the code may work perfectly fine, the static analysis rules can nevertheless flag these segments of source code as opportunities to improve the quality of the project. For example, static analysis rules can potential problems with correctness, e.g., in code for concurrent processes; maintainability, e.g., duplicate code segments; readability, e.g., code having excessive complexity; and framework usage, e.g., code that incorrectly uses code libraries, to name just a few examples. Such static analysis rules can be defined using one or more formalized coding standards. A static analysis system can use any appropriate set of coding standards, e.g., the NASA Jet Propulsion Laboratory Institutional Coding Standard for the Java Programming Language, available at http://lars lab.jpl.nasa.gov/JPL_Coding_Standard_Java.pdf.
A static analysis system can analyze the source code of a project to find instances in which source code elements satisfy rules in the collection of static analysis rules. Some static analysis systems define rules using database query languages, e.g., Datalog or SQL. For example, a static analysis system can parse the source code in a project to populate a database that stores various properties of source code elements in the project. A static analysis system can then use a query language to query the database to identify instances of source code elements that satisfy one or more static analysis rules.
When a rule is satisfied by the properties of one or more source code elements, a static analysis system can generate an alert. An alert is data that specifies information about the rule that has been satisfied, e.g., which source code elements are involved, which rule has been satisfied, the location in the project of the implicated source code elements, and an alert description that includes a user-readable message about the alert. A static analysis system can then present alerts in a user interface presentation for consumption by one or more developers of the project. For brevity, an alert can be referred to as occurring in a particular project when the alert identifies source code that occurs in a particular project.
The alerts can guide the developers on how to improve the quality of the source code in the project. For example, a team of developers can manually address each alert in turn, e.g., by fixing the problem identified by the alert so that the corresponding static analysis rule is no longer satisfied by the changed source code.
However, manually inspecting and addressing alerts can be a very time consuming and resource-intensive activity. Therefore, addressing each and every alert identified for a project is often an unrealistic goal. The number of alerts generated for real-world projects of any considerable size can be overwhelming. Therefore, there is often a mismatch between the number of alerts generated and the developer resources that are available to address alerts in the project. Developers have other responsibilities other than addressing alerts, e.g., writing new features and running tests. Thus, most projects have developer resources to address only a small fraction of the total alerts that are identified.