FIG. 1 illustrates an exemplary web page 100. The web page 100 is displayed in a browser window 110 having a title bar 120. The web page 100 comprises four text sections 132, 134, 136 and 138, in various styles and formats, a graphic image 140, a horizontal line 150 and a table 160 containing several hyperlinks 170.
FIG. 2 illustrates an HTML (hypertext mark-up language) document 200 corresponding to the web page 100. A browser program generates the web page 100 using the HTML document 200 as input. The basic building blocks of an HTML document are tags. Each tag is sandwiched between angle brackets (“<” and “>”). As an example, the first four tags in the HTML document 200 are “<HTML>,” “<HEAD>,” “<TITLE>” and “</TITLE>.” Tags come in two types: opening tags and closing tags. A Closing tag has a backslash (“/”) after the left angle bracket (“<”) and is otherwise the same as its matching opening tag. In the HTML document 200, the first example of a matched pair of opening and closing tags are “<TITLE>” and “</TITLE>” on the third line. Between these opening and closing tags is an argument (in this case, “OO Objects/Classes/Instances”), which shows up in the title bar 120 of the browser window 110. The HTML document 200 includes many other examples of matched pairs of opening and closing tags, like “<B>” and “</B>” for bold, “<I>” and “</I>” for italics, “<FONT . . . >” and “</FONT>” for font selection, “<CENTER>” and “</CENTER>” for horizontal centering, and “<TABLE>” and “</TABLE>” for a table. In fact, the entire HTML document 200 spans from the opening tag “<HTML>” to the closing tag “</HTML>.” Not all opening tags have a closing tag. Examples include “<BR>” for a line break, “<IMG>” for an image, “<HR>” for a horizontal rule or line, and “<-->” for comments.
Note that the HTML document 200 is illustrated in FIG. 2 as having uppercase tags. One may choose to use lowercase tags instead. In fact, XHTML (extensible HTML) requires that tags be in lowercase. XHTML also requires that an opening tag for which there is no closing tag be in the form “<TAG/>.”
Sections of the HTML document 200 are labeled with reference numbers having the same last two digits as reference numbers used to label corresponding parts of the web page 100. For example, the HTML title section 220 gives the web page its title in the title bar 120. The HTML section 232 produces the top text section 132 (“Understanding Object Orientation Concepts”) in an arial font, in an augmented size and in a particular color. The HTML section 234 produces the next line of text 134 in a bold format. The HTML section 236 produces the next paragraph of text 136, including several italicized words. The HTML section 238 produces the text section 138. The HTML section 240 produces the graphic image 140, centered horizontally on the web page 100, by referencing a graphics file (.gif). The HTML statement 250 produces the horizontal line 150. The HTML table section 260 produces the table 160, having three entries in a row. Each entry is a hyperlink 170, produced by the anchor statements 270 in the HTML document 200. The HTML document 200 may also contain comments (not shown) that do not appear in the browser window 110.
The conceptually simplest method for creating a markup language document is to type it manually using a text editor or word processor. However, manual preparation of documents is extremely labor intensive. Furthermore, manual preparation of documents is error prone. Even if additional labor is expended checking the document for errors or poor style, malformed documents can (and do) still result. Common errors are omission of required closing tags and having closing tags in the wrong order. Examples of poor style include not enclosing values in quotations (e.g., “arial,” “+2” or “red” in the font tag in the HTML section 232), typing special characters directly rather than their escape code (e.g., “&amp;”) in the HTML section 234 is the escape code for the ampersand character (“&”), and inadequate commenting. Errors and/or poor style in an HTML document can produce unpredictable results on different browsers.
Good practice demands that manually prepared HTML documents be validated. Although there exist HTML validation programs that can read an HTML file and report any errors or poor style, use of such validation programs requires extra time and effort. Furthermore, case by case validation processes are neither scalable nor extensible. As a result, it is difficult to generate a large number of consistent documents manually.
There are tools available to automate, to some degree, the generation of web pages. These tools are programs whose output is a markup language document. Examples of these tools are HTML editors and the automatic web page generator disclosed in U.S. Pat. No. 5,940,834. Page-based HTML editors typically present a browser view of a web page on which a user can enter and graphically manipulate items. Code-based HTML editors are essentially text editors enhanced with pull down menus, dialog boxes, shortcuts or other commands for entering tags in a quicker or more user friendly manner. Though simplifying HTML document creation for some authors, HTML editors fall short of providing complete automation and are not perfectly suited for high volume production.
The automatic web page generator disclosed in U.S. Pat. No. 5,940,834 is a software program that presents a user with menus by which the user can add, delete or modify information about individuals in an organization (e.g., employees in a company). The output of the software program are HTML documents that produce a web-based personnel directory (e.g., employee telephone directory). Authoring software such as that software program (or an HTML editor) is a time-consuming endeavor that requires specialized skills and knowledge of markup languages. Like any good software, HTML generating programs ideally produce error-free and easy to read output, are extensible, scalable, robust, have intuitive appeal and are themselves easy to read. Such an ideal is difficult to achieve.