Internet web sites are vulnerable to a number of different security vulnerabilities. Such security vulnerabilities can include malware, such as cross-site scripting (XSS) and cross-site request forgery (CSRF), as well as other types of security vulnerabilities. Therefore, operators of web sites commonly attempt to determine whether their web sites are vulnerable in this way.
One way to determine whether a web site is vulnerable is to employ a vulnerability scanner. Such a scanner includes crawling functionality that collects all the universal resource locator (URL) addresses of the web site and that may become the targets of attackers. To collect the URL addresses, a scanner loads the top page of a web site, and scans the web site to collect URL addresses within this web page that refer to other web pages within the same web site. This process is repeated at each web page, and is generally referred to as crawling. Each web page and its content can then be scanned for vulnerabilities.
However, such crawling of a web site is difficult for some types of web sites that employ web applications with complex client-side logic. For example, asynchronous JAVA and eXtended markup language (XML), or Ajax, technologies employ client-side JavaScript updates of the presentation of a web page by dynamically modifying the document object model (DOM) of the web site and its style sheet (such that the web site is considered to be dynamic). Other types of asynchronous communication also permit dynamic updates of data on a web page without having to reload the entire web page. JAVA is a trademark of Oracle Corporation, of Redwood Shores, Calif.
Crawling dynamic web sites employing such technologies is difficult, because the DOM of a dynamic web site is dynamically generated and modified at run-time. Thus, a web application at the same URL address may have different vulnerabilities that originate from different DOM states. The dynamic modification of the DOM can occur in any order, depending on how a user interacts with the web site in question. This flexible and dynamic nature of web applications renders them difficult to scan for security vulnerabilities, because there is not a static collection of URL addresses that a vulnerability scanner can crawl to look for such vulnerabilities.
For these and other reasons, there is a need for the present invention.