This invention relates generally to the field of computer systems. More particularly, a system and methods are provided for evaluating the integrity of a set of electronically stored files that contain hypertext links.
Traditionally, documentation for software and other computer-related products has been produced and distributed in a hard-copy form, such as books and manuals. However, more and more documentation and product help systems are being delivered in electronic form. As a result of the popularity of the Internet, and the widespread availability of software tools such as browsers that allow a person to read and navigate electronically stored hypertext link based file sets, HTML (HyperText Markup Language) has emerged as one of the de-facto standards for delivering electronic information.
HTML, as well as other types of standard generalized markup languages such as XML (Extensible Markup Language), provides a hardware-independent and operating system-independent scheme for defining the various components and cross-references found in a set of electronically stored hypertext link files—such as those that comprise a software help system or a World Wide Web site.
One of the chief benefits of a markup language such as HTML lies in its ability to link text and/or an image—in a document file currently being viewed—to the content of another document file, or to content found within the current document file. Many computer users are familiar with browser software that highlights the link text or image with color and/or an underline to indicate a hypertext link (often shortened to “hyperlink” or just “link”).
The amount of information available in electronically stored hypertext link file systems has grown enormously. Help systems and World Wide Web pages can encompass thousands of individual files, and each individual file may contain hundreds of links.
A large hypertext system, or set of web pages, often contains problems that detract from one's use or enjoyment of the system. Such problems include broken links, extra files and missing files. Linkage errors that prevent hypertext systems from being portable across different types of computer platforms and systems also arise. Linkage errors include errors in the format or syntax of a hypertext link. In addition, when the content of a help system or Web page is translated from one language to another, existing problems are propagated and additional hypertext link errors are often introduced.
Efforts by hypertext system authors to eliminate errors in hypertext systems before they are published have usually involved manually verifying the integrity of the hypertext links in the systems. This method of verification can be very time consuming and prone to error in all but the smallest of hypertext systems.
Software tools for performing hypertext system integrity checks often fail to operate properly on larger hypertext systems, therefore requiring the user to break a large set of hypertext-linked files into multiple smaller sets so that the tool can check the system. An unfortunate side effect of breaking up the set of files is that hypertext links in the divided file sets often become invalid as a result of the limitations of the tool used to verify the system. Also, current tools fail to provide comprehensive checks for all types of linkage errors.
Further, the available tools are limited in that they are unable to run validity checks of hypertext-linked file sets formatted in some types of protocols used in Help systems, such as Java Help and Oracle® Help for Java. And, the tools frequently fail to identify problems in hypertext file systems that prevent one hypertext system from being utilized on different computing platforms or operating systems. Also, existing tools are unable to analyze compressed files (e.g., files stored in a .JAR or .ZIP file). Such files must be manually decompressed before a tool can analyze them.
Still further, many of the user interfaces provided with the tools require time and labor intensive interactive operation, and provide no means of automating the verification process.
The analysis and reporting capabilities of current hypertext link verification checking tools are generally limited to the production of simple text files that list broken links. This type of reporting provides little assistance to the system author in identifying and repairing incorrect or invalid hypertext links in the system, and is often cumbersome to use.
Thus, there is a need in the art for a comprehensive, automated method of evaluating a set of files containing hypertext links to ensure that they are published without linkage errors. Such a method will allow virtually any type of hypertext-linked or markup language based file sets to be evaluated. A need also exists for a method of evaluation that will help ensure that hypertext-linked file systems are portable across hardware and operating system platforms without modification.