As software systems increase in complexity and the existing inventory of legacy code continues to be integrated into current systems, identifying the impact of change management requests is becoming more complex and time consuming. Some of the legacy code still in use has insufficient or inadequate documentation. This increasingly becomes a reliability issue as legacy code is still being incorporated into modern systems.
Traditional analysis of software systems includes both static and dynamic analysis. Static analysis analyzes the code for dependencies without relying on any external resources or dynamic events. Dynamic analysis on the other hand, focuses on how the system reacts to external inputs, for example user input from a keyboard or other input device. Another facet of dynamic analysis is testing how the system behaves when interacting with external files such as databases.
As an example, consider that Structured Query Language (SQL) queries in modern programming languages such as Visual Basic, Java, C++ and C# are computed dynamically at run-time as strings, which are then sent to the database for execution. These strings contain the names of databases, tables, and fields and can come from external sources such as user input, configuration files, or databases. Therefore it is impossible to discover these names solely by static analysis, since the data dependencies between the program and database component cannot be discovered. Other examples of external data accessed by software applications include files, CICS transactions, etc.
Other examples include heavily customizable applications such as SAP's enterprise resource planning (ERP) software application. SAP's ERP is a single application that depends on a great deal of information from various configuration tables, typically stored in an external database. The control-flow of the application is heavily dependent on the configuration data, which is external to the application itself. Configuration is accomplished via numerous “if” statements in the application, which refer to values in the external configuration data. The configuration data can also include references to external data sources such as databases. Without access to dynamic information, static analysis must assume all possible values, which (if the analysis is conservative) will cause it to report many false dependencies. Furthermore, in the case of infinite domains (such as names of databases, files, or transactions), static analysis will be unable to report any result.
Some dependencies may occur in every execution of the application, while others may only happen on certain executions, depending on user input, external data and the non-deterministic nature of the running system. In particular, due to the realm of modern technologies which are inherently dynamic, the problem becomes even more acute. For example, dynamic loading in Java may lead to references between classes that are dynamically determined (and thus cannot be statically detected); reflection can lead to complex logic that may stand behind each dependency; the execution of segments of code may well depend on environmental factors, etc. Another example is the composition of Structured Query Language (SQL) queries as strings that may depend on external data. The problem in understanding the dependency model is that we want to detect every link that may happen in any possible execution, while keeping the list as accurate as possible (i.e. not adding a link that may never happen). In addition, we want to focus on code sections that are responsible for a specific link, for example, identifying those statements in the code that perform certain method invocations which yield the subject link. This information is mandatory in “what-if” scenarios.
Present strategies for detecting dependencies utilize either complex static analysis techniques or are based on dynamic techniques which execute applications and infer the behavior and the dependencies from those runs. Strategies that are based on static analysis techniques are leveraging highly complex data and control flow, and there are cases where they will end-up detecting many false links due to the exponential nature of the problem. In addition, static analysis is limited by its nature, since it cannot infer dependencies that are driven by dynamic events (e.g., user input), or external resources (e.g., certain value kept in a database or property files that are external to the application code, classes may be loaded based on configuration files). On the other hand, engines that are based on dynamic analysis can have very high overhead since they require setting the runtime environment, and processing enough data to simulate all possible runs. Since dynamic engines infer only those links that are active during the executions of the application, dynamic analysis is generally not able to cover all possible links.
Therefore, there is a need for a more precise mechanism to detect dependencies without incurring the excessive overhead generally associated with comprehensive dynamic analysis. The mechanism should identify which code sections are prime candidates for further analysis, and should expend the range of dependencies that can be detected using current analysis techniques.