Software applications typically include flaws or defects, causing the software to operate in an unintended or undesired manner. Some defects can also be exploited to gain unauthorized access to the software and/or data associated therewith. Static vulnerability analysis techniques that can analyze the source code and/or one or more compiled binary files corresponding to a software application can be used to detect such vulnerabilities. A static analyzer that can analyze one or more compiled binary files corresponding to a software application is useful at least in situations where the source code is unavailable for a number of reasons, e.g., when the owner of the source code wishes not to disclose the source code to the tester performing the static analysis, when a portion of the binary files is obtained from a third party and the corresponding source code is not available from the third party, etc. As used herein, binary files are not limited to machine code only and may include files that are derived by processing source code and that include representations that are not source code representations, such as byte code, object code, etc.
In many instances, a static analyzer supports one or more programming languages (e.g., Java, C, C++, Python, etc.), i.e., the analyzer is customized for or configured specifically for one or more selected programming languages. The definition of a programming language can include a particular version thereof as well. For example, Java version 6 can be considered to be a different programming language than Java version 8. A static analyzer thus configured can be used to analyze a software system/application represented in binary and/or source code formats only if the source code is specified in one or more of the supported languages and if any binary portion of the software application is derived from one or more supported programming languages. In the discussion below, a language supported by a particular static analyzer is called a directly modeled language. A language that is not supported by a particular static analyzer is generally called an indirectly modeled programming language. Specifically, a static analyzer, by definition, does not directly analyze source code specified in an indirectly modeled language and also does not directly analyze binary code derived from source code specified in an indirectly modeled language.
In some situations, a software application includes not only a directly modeled language portion but also an indirectly modeled language portion. The directly modeled language portion may include one or more compiled binaries and/or source code files. The indirectly modeled language portion typically includes source code, e.g., scripts, specified in one or more languages other than any of the directly modeled languages corresponding to the static analyzer to be used to analyze the software system/application. Additionally or in the alternative, the indirectly modeled language portion may include an intermediate representation derived from an indirectly modeled language specification, as long as a syntax tree can be generated from the intermediate representation. Examples of indirectly modeled languages include Velocity Template Language (VTL), Freemarker, etc. A script written in an indirectly modeled language can directly access one or more data objects in the directly modeled language portion and can thus permit reading and/or modifying information associated with the data objects specified in the directly modeled language portion. This allows an indirectly modeled language script to adapt dynamically the behavior of the software application as desired, e.g., in response to the characteristics of the environment in which the application is executed, identity of the user for the benefit of whom the application is executed, etc.
The facility to access directly one or more data objects specified in the directly modeled language portion, however, can also expose certain vulnerabilities in the software application and may even introduce new vulnerabilities. For example, the indirectly modeled language script can be used to access data without authorization and a user input received by the indirectly modeled language script can be used to modify or delete important application data either unintentionally or with malice. As described above a static analyzer customized for a selected group of programming languages (i.e., directly modeled languages) typically cannot analyze source and/or binary code specified and/or derived from an indirectly modeled language, i.e., a language that is not included in the selected group. It is not uncommon, however, for a software system/application and/or a web application to be specified using both directly and indirectly modeled languages. Therefore, generally available static analyzers may not adequately detect the vulnerabilities that may be introduced by and/or exist within an indirectly modeled language portion of the software. Even when two different static analyzers are used—a first one configured for a group of directly modeled languages and a second one configured for languages that are indirectly modeled from the perspective of the first static analyzer—certain vulnerabilities may nevertheless not be detected because the two analyzers generally do not exchange their respective analysis and inferences so as to perform a comprehensive analysis of the overall software/web application. Therefore, there is a need for an improved system and/or method for detecting defects and/or vulnerabilities in software and/or web applications.