In order to generate high quality software programs, it is important to test and analyze the functionality of the software program. Dynamic analysis involves actually running the program on well-chosen examples to verify actual behavior of the program. However, it is not always practical to perform dynamic analysis, especially when the individualities of the environment in which the program will be run are not known or are varied. Dynamic analysis is further performed only when the program is completed (possibly using stub classes and functions) and capable of being run.
Static analysis inspects the source or program code without running it. Path sensitive dataflow analysis attempts to exhaustively and precisely predict every path over an abstract domain. This is highly useful for diagnosing problems such as security or localizability problems. In such path sensitive dataflow analysis, a component called a “client” may collect data defined specifically for a problem that the client is suited to detect. The client is given a program in an intermediate representation, which consists of program statements and control flow edges. The client then computes outgoing state for every statement given incoming state.
However, the intermediate representation for some functions can be missing, too complex, or generic, causing the client to make more or less accurate assumptions, which can lead to the detection of false defects (“noise”) as well as non-detection of real defects.
Sometimes, only a part of the whole program is analyzed at a time, to make analysis scalable. In this case, missing external components can cause the tools to over-approximate the possible program behavior, leading them to find false defects (“noise), or miss real defects. For example, setting and getting a property of an externally defined class can cause noise when the tool thinks a value of a property could be different from the one which was set on the same defect path before.
Heavily used external components with well-known behavior, such as .Net or STL data structures, usually have code that is too complex or too large to be fully included in the analysis. However, they cause a significant amount of noise (or non-detection of real defects) if the analysis approximates them away completely by, for example, assuming anything is possible as a result of calling an external Application Program Interface (API). For example, a C++ STL map is usually implemented as a balanced tree. It can be prohibitively difficulty to induce from the complex mechanics of the various operations that, say, insert(“a”, 1) followed by retrieve(“a”) returns 1 (assuming no other code is running concurrently).
Generic functions in .Net form parameterized intermediate representation, in which statements are parameterized by a type. Such intermediate representation usually has generic-related statements, which may have a different meaning depending on the concrete instantiation. For example, creation of an object of parameterized type can mean allocation of a heap object and calling a constructor, for instantiations with reference types, or creating and initializing a stack variable, for instantiations with value types. The tools are forced then into a complicated logic of understanding the meaning of the generics-related statements depending on the concrete instantiation at each call to instantiated generic API.