Automated tools are often used to autonomously interact with Internet-based computer software applications, or “web” applications, such as to discover the various components of a web application for mapping purposes, or to identify programming errors and security vulnerabilities in a web application. For example, one such automated tool, commonly known as a “crawler,” is often used to navigate a web application by traversing its web pages and other computer-based documents along hyperlinks, such as Universal Resource Locators (URLs), embedded in the documents that indicate the locations of other documents. Another such automated tool, commonly known as a “black-box tester,” is often used to interact with a web application by activating interface elements such as its menus, buttons, and hyperlinks, and by providing data input through its interface elements such as textboxes, and then searching for evidence that an interaction exposed a known type of programming error or security vulnerability.
When traversing a web application such automated tools may encounter the same web page multiple times, in which case it is often desirable to avoid duplicating previous interactions with the web page, especially where time, processing, and/or networking resources available for interacting with a web application are limited. Unfortunately, this is often complicated by web applications that produce web pages that are equivalent yet not identical. For example, two instances of the same web page may be encountered, where each instance of the web page includes a different, dynamically-generated advertisement while otherwise being identical. Moreover, in some cases it is desirable to classify genuinely different web pages as being equivalent from a functional standpoint when deciding whether or not to interact with a web page. For example, a web application may have multiple static web pages, each with information on a different topic, but where each web page includes a single “OK” button that closes the web page. In this instance there is likely no benefit in having an automated tool interact with each of the web pages.