Web applications are enabled by a web server in communication with a client device executing a web client, such as a web browser environment, Hypertext Markup Language (HTML) pages, and JavaScript code. Web crawling is the act of exploring various pages in a web application from the point of view of the web client. As web technology becomes more complex (e.g., implementing Asynchronous JavaScript and XML (AJAX), WebSockets, web workers, etc.), web crawling has to adapt to handle more complex behaviors and interactions with web applications.
Web crawling may be useful for security purposes. For example, web application vulnerability scanners traverse web applications via web crawling to identify vulnerable portions of web applications. Similarly, web application firewalls traverse web applications and construct a baseline of normal interactions between the web server and the web client, which may be useful for anomaly detection. To be effective, web crawling for security purposes should be complete. In other words, all pages should be visited under all relevant input conditions. For example, a common heuristic may be to follow all links and trigger all actions on each page until no new pages are reached.