1. Field of the Invention
The present invention relates generally to web application scanning technology, and more particularly to systems and methods of detecting and monitoring server side states during the scanning of web applications.
2. Description of the Related Art
Web application security scanners, such as the IBM Rational AppScan, are tools that automatically spider or scan and test a web application in order to find vulnerabilities on it. These tools scan the site either automatically, by “clicking” on all the links, executing JavaScript, submitting forms, or manually, by following a user while browsing through the application. They then proceed to test the application by resending the user's requests with various modifications to trigger database (DB) exceptions and to uncover logic flaws and other vulnerabilities.
A significant challenge is maintaining a server-side state throughout the scan. The most widely faced example of such a server-side state is a login state. Web applications usually grant access to specific pages only while the user is logged in, and otherwise redirect the user to the login page or deny access if the user is not logged in. Staying logged in is a key factor for web application scanners. Without remaining logged in, the web application may assume that the scanner is invoking a page with malicious input, and be redirected to the wrong page.
For scanners, remaining logged in to the web application is not simple. A logout link may be accidentally triggered, the web application may detect an occurrence of suspicious activity during a session and invalidate the session, or there may be many other reasons that cause the scanning session to become logged out. Therefore, existing scanners have the capability to identify when a session is no longer valid, and in those cases the scanner will replay the login sequence, re-establish their session, and continue scanning.
There are two ways by which scanners identify whether it is still logged in to a site. First, a scanner will review the responses of all tests and regular requests to the site, to detect whether a pattern exists or is missing from the response that indicates it is logged out (such as a “Your session timed out, please re-login”). This technique is often inaccurate since different pages of the web application may respond in different (and not always predictable) ways when the session is logged out, sometimes simply returning an error, sometimes redirecting to another page, or sometimes giving a “no can do” message.
Second, a scanner will send a specific “heartbeat” request every few seconds (or X requests), and look for a pattern in its response. This technique takes away the unpredictability and the random manner by which each page responds when the scanner is logged out and the session ends. However, this technique of identifying whether a scanner is still logged in causes some performance overhead when the scanner determines that it is not logged in on time, needing to re-do the requests sent between the last successful heartbeat and the first failed one.
In addition, both techniques also require that the user configure the “Session Identifiers” that need to be refreshed. Since Hypertext Transfer Protocol (HTTP) is a stateless protocol, web applications save the session identifiers in cookies, HTTP parameters, the HTTP path, or other parts of the HTTP requests (e.g. the JSESSIONID cookie or parameter). When the scanner logs in again, it needs to know which parts of the HTTP requests to “track” and refresh on its next requests, so that the web application can consider the requests as part of the new session and not the old.
Both techniques also require user configuration, which is not always simple for users. While users may know how to login to the application, it is often complicated and time consuming for them to search for the session IDs or to configure the patterns to look for login state identification.
Although remaining logged in is the most common problem of the server-side state, modern web applications utilize server-side state on many other parts of the application. One example of such an application is an online flight reservation that requires users to select the departure and arrival city, then prompts users to select the fares and hours, and then for the choice of seating, etc. The application does not, however, allow users to change the order of selection. Another example is an online banking application that has a multi-step process for transferring funds, which requires multiple verification actions by the user. For these web applications, a scanner needs to recognize that every time it sends a request that is triggered later in the sequence (the later request), for example, request 5, the scanner must first send requests that are necessarily triggered before the later request, for example, requests 1-4.
These situations are currently only handled by requiring users to manually configure each sequence. These sequences often have HTTP variables that need tracking itself (such as the login sequence). For example, in a banking application, the process for transferring funds might involve creating a “transaction-id” parameter that needs to be tracked throughout the sequence. As such, sequences that modify server-side state require a lot of user time and attention.