This invention relates generally to analysis of program code and, more specifically, relates to analysis of script content in a webpage or other program.
This section is intended to provide a background or context to the invention disclosed below. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented or described. Therefore, unless otherwise explicitly indicated herein, what is described in this section is not prior art to the description in this application and is not admitted to be prior art by inclusion in this section. Acronyms used in the specification or drawings are presented below.
Functional testing of web applications is a central problem. Such functional testing includes testing web applications for security vulnerabilities, responsiveness, broken/incorrect workflows, and the like. A major challenge in functional testing is to obtain satisfactory coverage of the business logic of the subject web application. This is a goal of a functional crawler (e.g., running as a first phase of testing), which visits links and crawls through webpages in the same way as text crawlers, but has the objective of increasing functional rather than content coverage.
In recent years, rich internet applications (RIAs) are becoming increasingly widespread. Such applications make intensive usage of JavaScript and AJAX calls to enable smooth and dynamic user experience. For the functional crawler, this is a primary challenge. It is no longer sufficient to crawl the page in its initial form, because the JavaScript programs the page contains may interact with the server side and/or transform the webpage such that new possibilities of interaction arise.
This source of complication mandates new ways of deciding which JavaScript functions to execute as part of functional crawling. The naïve approach of simply running all functions in some arbitrary order, which is how existing tools like IBM Security App Scan Standard and Enterprise Edition address this difficulty, may be problematic because of the performance costs and potential side effects of JavaScript functions.
To appreciate this, note that HTML pages featured by industry-scale web sites can easily be over 10,000 lines long, and contain hundreds of JavaScript functions. This is especially true of auto-generated HTML pages created by client-side and/or server-side web frameworks like Struts and jQuery. Struts is a free, open-source, Model View Controller (MVC) framework for creating elegant, modern Java web applications. Java is a programming language and computing platform first released by Sun Microsystems in 1995. jQuery is a cross-platform JavaScript library designed to simplify the client-side scripting of HTML. As one example of auto-generated HTML pages, a table appearing in the HTML page may associate “Edit”, “Delete”, “Insert”, and other operations with every row of the table that links to auto-generated JavaScript handlers.
For a commercial-grade website with many thousands of webpages, executing all the JavaScript functionality to improve coverage is often intractable. Hence, tools like AppScan often resort to user-provided bounds and configurations of different kinds that effectively constrain the web crawler in terms of the number of web pages the web crawler visits and the depth to which the web crawler processes and explores these webpages. Fixed, ad-hoc bounds lead to obvious problems and limitations in coverage, and are thus best avoided.