1. Field of the Invention
The present invention relates to file/document conversion in a computer system, and deals more particularly with a method, system, and computer program for conversion of HTML files/documents created from SGML source files to a format readable and useable by a subsystem utilizing an XML format.
2. Description of the Related Art
The recent explosion in the use of the World Wide Web (hereinafter “the web”) has created numerous opportunities for programmers to create and make available software developments usable in the web environment.
Frequently web-based programs consist of several smaller programs which interact with each other to perform the various functions of a particular web page. For example, a single web page may include hyperlinks which, when activated, launch a “plug-in”, a Java applet, a “help” menu, or any of a myriad of other programs which enhance the use of the web page.
As an example, as web pages become increasingly more complex, web page designers often find it necessary to make some form of instructional, or help, information optionally available to the user. Typically, help information for a web page requires loading of a new HTML page into the browser, and when the user is finished reading the help information, loading the original HTML page back into the browser. Recently, Sun Microsystems, Inc. of Palo Alto, Calif. introduced a program called JavaHelp™ which is a platform-independent help system that enables developers and authors to incorporate on-line help in applets, components, applications, operating systems, and devices. Authors can also use the JavaHelp software to deliver on-line documentation for the web and corporate intranets.
Many programs such as JavaHelp require the use of data formatted in Extensible Mark-up Language (XML) to properly function. XML is quite flexible in its ability to be made to function in many different “domains” (i.e., user-defined sets of information), such as a mathematics domain, a Java domain, etc. and thus the use of XML is increasing. Meanwhile, software developers writing text (such as help documents) for use in connection with the JavaHelp or other XML-based programs, generally choose to deliver their documentation in HTML format based on documentation originally authored in SGML source code. The SGML source files are converted to the format of choice (e.g., HTML, PDF, PostScript, RTF, etc.) before the final product is delivered for use.
Authoring this documentation in SGML format offers may advantages, including the flexibility of being able to convert the SGML code to, and deliver the final product in, a variety of different formats (e.g., HTML, PostScript, RTF, and/or PDF); the ability to share information across all of the documents in the particular application being created; and the ability to perform the previously-mentioned functions while maintaining a single set of source files. Using a single set of SGML source files allows the programmers to learn only one set of tagging codes to create the SGML source files, which can then be used to create final documentation in a format preferred by the end-user. Further, the SGML source files can be shared among the programmers to avoid duplication of effort, even though two programmers may be providing the end-product to their respective customer in different formats.
Although programs such as JavaHelp can display HTML-formatted documents, the filenames generated when transforming the documentation from the SGML source files to the HTML documents are dynamically generated. As a result, for example, information contained in a file named “HTML009.HTML” for a given version of the HTML output might be contained in file “HTML012.HTM” in a future version of the same document. Because of this, programs such as JavaHelp, which utilize static (fixed) file names, may not be able to immediately access the correct help file when the user attempts to retrieve it. In addition, the conversion process from the SGML source files to the HTML document file assumes that the created document will be used in a browser-type environment and, thus, provides an HTML version of the Table of Contents (TOC). This TOC also utilizes dynamic file names and hotlinks generated during the conversion process. Programs such as JavaHelp require an XML version of the Table of Contents based on a fixed identifier associating the various help files to the corresponding portion of the GUI. Accordingly, a conflict exists between the XML and HTML files.
Typically, to resolve this conflict, developers manually create a set of help source files, map files and a TOC to accommodate the XML-based program's requirements. This requires the developer to maintain two separate yet identical (in terms of content) sets of source files (an SGML set and an HTML set) or, if the source was converted from SGML to HTML, the developer loses all of the previously-mentioned advantages provided by the use of an SGML source base.
Thus, a need exists for a technique by which a software developer developing files using SGML source files can seamlessly and automatically present the HTML files generated from the SGML source files to an XML environment without the need to maintain multiple sets of content-identical source files.