An on-line information system typically includes at least one computer system (the server) that makes information available so that one or more other computer systems (the clients) can access the information. The server manages access to the information by the client. Communication between the server and client is via messages conforming to a communication protocol and sent over a communication channel such as a computer network, a dial-up connection, or other type of link.
Information sources managed by the server may include files, databases, and applications that are stored on the server system or on a different computer system. The information that the server provides may be converted from other formats manually or automatically, may be computed on the server in response to a client request, may be derived from data and applications on the server or other machines, or may be derived by any combination of these techniques.
On-line services are available on the World Wide Web (WWW), which operates over the global Internet network. The Internet interconnects a large number of otherwise unrelated computers or sites. Client services analogous to those on the WWW can also be made available on private networks called "Intranets" that may not be connected to the Internet, and which typically run over local area networks (LANs) or wide area networks (WANs). The WWW and similar private arehitectures provide a "web" of interconnected document objects. On the WWW, these document objects are located at various sites on the global Internet, but can generally be accessed by a client computer that is connected to the Internet at any point. A more complete description of the WWW is provided in "The World-Wide Web," by T. Berners-Lee, R. Cailliau, A. Luotonen, H. F. Nielsen, and A. Secret, Communications of the ACM, 37 (8), pp. 76-82, August 1994, and in "World Wide Web: The Information Universe," by T. Berners-Lee et al., in Electronic Networking: Researeh, Applications and Policy, Vol. 1, No. 2, Meckler, Westport, Conn., Spring 1992.
Among the types of objects accessed via an on-line service are documents and scripts. Documents that are published on the WWW are written in HTML. This language is described in HyperText Markup Language Specification--2.0, by T. Berners-Lee and D. Connolly, RFC 1866, proposed standard, November 1995, and in "World Wide Web & HTML," by Douglas C. McArthur, in Dr. Dobbs Journal, December 1994, pp. 18-20, 22, 24, 26, and 86. Many companies are developing enhancements to HTML that extend the capabilities of the original specification. HTML documents can be created using programs such as Web page editors that are specifically designed for that purpose, or by executing a script file.
The HTML language is used for preparing hypertext documents, which are more formally referred to as Standard Generalized Markup Language (SGML) documents conforming to a particular Document Type Definition (DTD), for access by others over a network. An HTML document includes a hierarchical set of markup elements; most elements have a start tag, followed by content, followed by an end tag. The content is a combination of text and nested markup elements. Tags, which are enclosed in angle brackets (`&lt;` and `&gt;`), indicate how the hypertext document is structured and how to display the document, as well as destinations and labels for hypertext links. There are tags for markup elements such as titles and headers, text attributes such as bold and italic, lists, paragraph boundaries, links to other documents or other parts of the same document, in-line graphic images, and for many other features.
The following lines of HTML briefly illustrate how the language is used:
Some words are &lt;B&gt;bold&lt;/B&gt;, others are &lt;I&gt;italic&lt;/I&gt;. PA1 Here's a link to the &lt;A PA1 Some words are bold, others are italic. Here we start a new paragraph. PA1 Here's a link to the Microsoft Corporation homepage. PA1 Hello there! PA1 Goodbye!
Here we start a new paragraph.&lt;P&gt; PA2 HREF="http://www.microsoft.com"&gt;Microsoft Corporation&lt;/A&gt;homepage.
These exemplary lines of HTML would be considered part of a hypertext document because they contain a hypertext "link" to another document (in the line that includes "HREF="). The format of this link is described below. A hypertext document may also have a link to other parts of the same document. Linked documents may generally be located anywhere on the Internet. When a user is viewing the document using a client program called a Web browser (described below), the links are displayed as highlighted words or phrases, or represented as graphic images. When viewed with a Web browser, a Web page containing the sample lines of HTML presented above might be displayed on the user's screen as follows:
In a Web browser, the hypertext link is selected by clicking on the highlighted area with a mouse, causing the hypertext link to be activated. Typically, the screen cursor changes when positioned on a hypertext link. Selecting and activating a hypertext link will cause the associated or referenced document to be displayed. Thus, clicking on the highlighted text "Microsoft Corporation" would fetch and display the associated homepage for that entity.
The HTML language provides a mechanism (the image or "IMG" element) enabling an HTML, document to include an image that is stored as a separate file. When the end user views the HTML document, the included image is displayed as part of the document, at the point where a reference to the image element occurs in the document.
Another kind of document object is a script. A script is an executable program, or a set of commands stored in a file, that can be run by Web server to produce an HTML document that is then returned to the Web browser on the client computer. Typical script actions include running library routines or other applications to fetch information from a file or a database, or initiating a request to obtain information from another machine, or retrieving a document corresponding to a selected hypertext link. A script may be run on the Web server when, for example, the end user selects a particular hypertext link in the Web browser, or submits an HTML form request. Scripts are usually written by a service developer in an interpreted language such as Basic, or in a compiled language such as "C.sup.++."
A site at which documents are made available to network users is called a Web site and must run a Web server program to provide clients access to documents referenced on any Web pages at that site. A Web server program is a computer program that allows a computer on the network to make documents available to the rest of the WWW or a private network. The documents are often hypertext documents in the HTML language, and may include images, audio, and/or video information. The information that is managed by the Web server includes hypertext documents that are stored on the server or are dynamically generated by scripts on the Web server.
A user who wishes to access documents available on the network at a Web site must run a Web browser program on the client computer. The Web browser program allows the user to retrieve and display documents from Web servers over the network. Popular Web browser programs include the Internet Explorer.TM. from Microsoft Corporation.
To retrieve a document, a Web browser may request a document by reference to the uniform resource link (URL) of the document. The Web server then retrieves the document and returns it in a response message to the Web browser. If the returned document has hypertext links, the user may select one of the links to request that a new document be retrieved and displayed. A user may also fill in a form requesting a database seareh. In response, the Web browser will send a request message to the Web server including the name of the database to be searched, the search parameters, and the URL of the search script.
The HTML is extensible and new features are added with each new generation of Web browsers that is released. Each of these new features are typically referenced by an appropriate HTML tag in a Web page document. A new HTML tag is given a name and its attributes are defined so that a Web browser can provide support for it, i.e., provide some form of visual representation of the new tag and facilitate the functionality it may provide. Thus, new tags can readily be implemented in a new version of a Web browser that is programmed to handle the new tags, but are simply ignored in any version of the Web browser that is not programmed to handle the new HTML tag.
Although a text-based HTML editor that only displays text representing each HTML tag can easily handle new tags, a "what you see is what you get" (WYSIWYG) HTML editor generally cannot. Furthermore, WYSIWYG HTML editors usually lag behind the release of new Web browsers, creating a problem for those who want to create documents that include the new HTML tags that are implemented in the newest browsers. Another aspect of this problem arises because it may be desirable for a specific HTML editor to be able to store parameters, authoring state data, or other data that will not and should not be recognized as valid HTML code by other editors--and more importantly, by Web browsers. Such data may be needed when it is necessary to edit an HTML document that was originally created using that editor. Moreover, the WYSIWYG HTML editor should be able to display something acceptable to the user when encountering lines in the document that would not be considered valid HTML or which were not known in advance to the editor.
One solution to this problem is simply to include new tags or other content that would not be recognized as valid HTML in the file for a document with the expectation that a Web browser will simply ignore or skip over tags that are not recognized. For example, a Web browser provided with the following tags:
______________________________________ &lt;HTML&gt; &lt;BODY&gt; Hello there?&lt;p&gt; &lt;my.sub.-- new.sub.-- tag attribute="foo" other= "bar"&gt; &lt;Goodbye! &lt;p&gt; &lt;.backslash.BODY&gt; &lt;.backslash.HTML&gt; ______________________________________
should display:
and the tag that includes the phrase "my.sub.-- new.sub.-- tag" should simply be ignored, if the Web browser has not been programmed to recognize this tag. However, some Web browsers do not properly handle unrecognized HTML. Specifically, differing amounts of vertical white space (lines) can be displayed when the page contains an unknown HTML tag, and something other than a blank line can be generated on the page where the unknown HTML is encountered. Creation of new HTML tags is thus discouraged, since it will be necessary to continually update parse code in a browser (or editor) to handle the new tags. Accordingly, a better solution to this problem is required.