The World Wide Web is a collection of hypertext documents, or “web pages”, which may contain text, pictures and/or sound. A hypertext document is written in a programming language to create an executable program. Since hypertext documents are written with generally compatible programming languages including Hypertext Markup Language, “HTML”, Extensible Markup Language, “XML” and XHTML (Extensible Hypertext Markup Language), these documents can be presented, stored and distributed almost universally across a network of client terminals and servers, such as the public Internet or a private intranet.
As a web page is being developed, there are certain rules which must be followed. A project manager may include compliance testing as part of the modular programming stage of a project when each programmer has a section of code to write and debug. Alternatively, a project manager may choose to include compliance testing in the integration stage of a project when code sections are integrated into a complete document. However, it is difficult and time consuming for project managers to track down compliance problems at the integration stage of the development process. The origin of the problem may be embedded in a section of code or may be a result of the integration of the various sections. Further, after a web page has been developed, even if it is rules compliant at the time of development, the rules may change.
For example, a company or a department may change its name, or the name of a product. In this case it would be desirable to determine on which web pages the former name appears, and where on those web pages the former name appears.
As another example, a hypertext document should be in a format useful for presentation to all users. However, standardized programming syntax is often necessary to ensure the correct performance of assistive I/O software and devices, such as a web page reader, that are intended to allow certain disabled users access to web page content. Thus, it is often desirable to standardize hypertext documents for compliance with selected criteria or rules, which may be new rules, or which may be modifications to older rules. These rules may include code validation standards to which a document should adhere for uniform presentation, universal network compatibility, or compliance with government regulations. To that end, there has been a recent effort by governments and non-governmental agencies to establish accessibility rules that provide a standard for developers to follow when creating hypertext documents. For example, Section 508 of the U.S. Rehabilitation Act of 1973 was enacted to improve access to mainstream technology and provide standards for assistive technologies for disabled users.
Once rules, such as a name change or those in Section 508, have been developed and become effective, documents should be checked to ensure that they are rules compliant.
Programs are available wherein the user can specify a web page and the rule or rules to be tested, and the program will access that web page, analyze that web page for compliance with the specified rules, and provide a report. However, these programs are limited to analysis of a single web page, and only provide a report, not a copy of the web page, so each web page must be separately specified and tested. Also, someone must compile a list of the web pages to be analyzed and input this information, one web page at a time. This is tedious and time-consuming and very prone to errors. For example, in sites where there are hundreds or even thousands of web pages, some pages will be missed, and other pages will be unnecessarily tested several times. Further, the prior art does not provide information about a collection of tested web pages. For example, the prior art does not enable one to determine what rules are most problematic and where they occur.
Web crawler programs are also available. A web crawler program starts with a specified web page, copies and stores it, locates links within that web page to other web pages, accesses, copies and stores each of those web pages, etc. However, although web crawlers may eventually capture and store all of the web pages of an entire site, they do not analyze those web pages.