This invention relates generally to analysis of program code and, more specifically, relates to analysis of the use of strings in program code.
This section is intended to provide a background or context to the invention disclosed below. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented or described. Therefore, unless otherwise explicitly indicated herein, what is described in this section is not prior art to the description in this application and is not admitted to be prior art by inclusion in this section. Acronyms that appear in the text or drawings are defined below, prior to the claims.
Static analysis is an analysis that involves examining the code of programs such as Web programs without executing the code of the program. Some type of model is (or models are) created of the code of the program, to estimate what would happen when the code actually is executed. String analysis is a form of static analysis, where the properties the string analysis tracks relate to strings and their dynamic values. This type of analysis has several important applications, including accessibility (e.g., in websites), typestate analysis (where file names and other resource identifiers are tracked precisely) and security.
There is a vast body of work on string analysis. The paper by Tateishi, Pistoia, and Tripp, entitled “Path- and Index-sensitive String Analysis Based on Monadic Second-order Logic” (ISSTA '11, Jul. 17-21, 2011, Toronto, ON, Canada) provides a rich discussion of recent works in this space, with special emphasis on index-sensitive string analyses. A key feature of such analyses is that they model both string values and integral values, such that string operations like “substring” can be modeled accurately.
A problem with this approach is that the scalability of the analysis is limited: Instead of tracking only specific string values, the analysis now needs to further account for integral variables in the program and model transformations on integral values in a sound and precise manner. More scalable approaches, such as modeling strings as regular expressions or context-free grammars, are significantly more scalable, but do not provide adequate support for index-based string manipulations.