An on-line information system typically includes one computer system (the server) that makes information available so that other computer systems (the clients) can access the information. The server manages access to the information, which can be structured as a set of independent on-line services. The server and client communicate via messages conforming to a communication protocol and sent over a communication channel such as a computer network or through a dial-up connection.
Typical uses for on-line services include document viewing, electronic commerce, directory lookup, on-line classified advertisements, reference services, electronic bulletin boards, document retrieval, electronic publishing, keyword searching of documents, technical support for products, and directories of on-line services. The service may make the information available free of charge, or for a fee, and may be on publicly accessible or private computer systems.
Information sources managed by the server may include files, databases, and applications on the server system or on an external computer system. The information that the server provides may simply be stored on the server, may be converted from other formats manually or automatically, may be computed on the server in response to a client request, may be derived from data and applications on the server or other machines, or may be derived by any combination of these techniques.
The user of an on-line service uses a program on the client system to access the information managed by the on-line service. Possible user capabilities include viewing, searching, downloading, printing, editing, and filing the information managed by the server. The user may also price, purchase, rent, or reserve services or goods offered through the on-line service.
An on-line service for catalog shopping, which is an exemplary application of this technology, might work as follows. A user running a program on a client system requests a connection to the catalog shopping service using a service name that either is well known or can be found in a directory. The request is received by the server employed by the catalog shopping service, and the server returns an introductory document that asks for an identifier and password. The client program displays this document, the user fills in an identifier and password that were assigned by the service in a previous visit, and the information is sent to the server. The server verifies the identifier and password against an authorization database, and returns a menu document that is then presented to the user. Each time the user selects a menu item, the selection is sent to the server, and the server responds with the appropriate new page of information, possibly including item descriptions or prices that are retrieved from a catalog database. By selecting a series of menu items, the user navigates to the desired item in the catalog and requests that the item be ordered. The server receives the order request, and returns a form to be completed by the user to provide information about shipping and billing. The user response is returned to the server, and the server enters the order information into an order database.
On-line services are available on the World Wide Web (WWW), which operates over the global Internet. The Internet interconnects a large number of otherwise unrelated computers or sites. Similar services are available on private networks called "Intranets" that may not be connected to the Internet, and through local area networks (LANs). The WWW and similar private architectures provide a "web" of interconnected document objects. On the WWW, these document objects are located at various sites on the global Internet. A more complete description of the WWW is provided in "The World-Wide Web, " by T. Berners-Lee, R. Cailliau, A. Luotonen, H. F. Nielsen, and A. Secret, Communications of the ACM, 37 (8), pp. 76-82, August 1994, and in "World Wide Web: The Information Universe," by T. Berners-Lee et al., in Electronic Networking: Research, Applications and Policy, Vol. 1, No. 2, Meckler, Westport, Conn., Spring 1992.
Among the types of document objects in an on-line service are documents and scripts. Documents that are published on the WWW are written in the Hypertext Markup Language (HTML). This language is described in HyperText Markup Language Specification--2.0, by T. Berners-Lee and D. Connolly, RFC 1866, proposed standard, November 1995, and in "World Wide Web & HTML," by Douglas C. McArthur, in Dr. Dobbs Journal, December 1994, pp. 18-20, 22, 24, 26 and 86. Many companies also are developing their own enhancements to HTML. HTML documents are generally static, that is, their contents do not change over time unless modified by a service developer. HTML documents can be created using programs specifically designed for that purpose or by executing a script file.
The HTML language is used for writing hypertext documents, which are more formally referred to as Standard Generalized Markup Language (SGML) documents that conform to a particular Document Type Definition (DTD). An HTML document includes a hierarchical set of markup elements; most elements have a start tag, followed by content, followed by an end tag. The content is a combination of text and nested markup elements. Tags, which are enclosed in angle brackets (`&lt;` and `&gt;`), indicate how the document is structured and how to display the document, as well as destinations and labels for hypertext links. There are tags for markup elements such as titles and headers, text attributes such as bold and italic, lists, paragraph boundaries, links to other documents or other parts of the same document, in-line graphic images, and for many other features.
The following lines of HTML briefly illustrate how the language is used:
Some words are &lt;B&gt;bold&lt;/B&gt;, others are &lt;I&gt;italic&lt;/I&gt;. Here we start a new paragraph.&lt;P&gt; PA1 Here's a link to the &lt;A HREF="http://www.microsoft.com"&gt;Microsoft Corporation&lt;/A&gt; homepage. PA1 "scheme" identifies the access protocol (such as HTTP, FTP or GOPHER); PA1 "host" is the Internet domain name of the machine that supports the protocol; PA1 "port" is the transmission control protocol (TCP) port number of the appropriate server (if different from the default); PA1 "path" is a scheme-specific identification of the object; and PA1 "search" contains optional parameters for querying the content of the object.
This sample document is a hypertext document because it contains a hypertext "link" to another document, in the line that includes "HREF=." The format of this link is described below. A hypertext document may also have a link to other parts of the same document. Linked documents may generally be located anywhere on the Internet. When a user is viewing the document using a client program called a Web browser (described below), the links are displayed as highlighted words or phrases. For example, using a Web browser, the sample document above might be displayed on the user's screen as follows:
Some words are bold, others are italic. Here we start a new paragraph.
Here's a link to the Microsoft Corporation homepage.
In the Web browser, the link may be selected, for example, by clicking on the highlighted area with a mouse. Typically, the screen cursor changes when positioned on a hypertext link. Selecting a link will cause the associated document to be displayed. Thus, clicking on the highlighted text "Microsoft Corporation" would fetch and display the associated homepage for that entity.
The HTML language also provides a mechanism (the image or "IMG" element) enabling an HTML document to include an image that is stored as a separate file. When the end user views the HTML document, the included image is displayed as part of the document, at the point where the image element occurred in the document.
Another kind of document object in a web is a script. A script is an executable program, or a set of commands stored in a file, that can be run by a server program called a Web server (described below) to produce an HTML document that is then returned to the Web browser. Typical script actions include running library routines or other applications to fetch information from a file or a database, or initiating a request to obtain information from another machine, or retrieving a document corresponding to a selected hypertext link. A script may be run on the Web server when, for example, the end user selects a particular hypertext link in the Web browser, or submits an HTML form request. Scripts are usually written by a service developer in an interpreted language such as Basic, Practical Extraction and Report Language (Perl) or Tool Control Language (Tcl) or one of the Unix operating system shell languages, but they also may be written in more complex programming languages such as "C" and then compiled to produce an executable program. Programming in Tcl is described in more detail in Tcl and the Tk Toolkit, by John K. Ousterhout, Addison-Wesley, Reading, Mass., USA, 1994. Perl is described in more detail in Programming in Perl, by Larry Wall and Randal L. Schwartz, O'Reilly & Associates, Inc., Sebastopol, Calif., USA, 1992.
Each document object in a web has an identifier called a Universal Resource Identifier (URI). These identifiers are described in more detail in T. Berners-Lee, "Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web," RFC 1630, CERN, June 1994; and T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform Resource Locators (URL)," RFC 1738, CERN, Xerox PARC, University of Minnesota, December 1994. A URI allows any object on the Internet to be referred to by name or address, such as in a link in an HTML document as shown above. There are two types of URIs: a Universal Resource Name (URN), and a Uniform Resource Locator (URL). A URN references an object by name within a given name space. The Internet community has not yet defined the syntax of URNs. A URL references an object by defining an access algorithm using network protocols. An example of a URL is "http://www.microsoft.com". A URL has the syntax "scheme://host:port/path?search" where
URLs are also used by web servers and browsers on private computer systems, Intranets, or networks, and not just for the WWW.
A site at which documents are made available to network users is called a "Web site" and must run a "Web server" program to provide access to the documents. A Web server program is a computer program that allows a computer on the network to make documents available to the rest of the WWW or a private network. The documents are often hypertext documents in the HTML language, but may be other types of document objects as well, and may include images, audio, and/or video information. The information that is managed by the Web server includes hypertext documents that are stored on the server or are dynamically generated by scripts on the Web server. Several Web server software packages exist, such as the Conseil Europeen pour la Recherche Nucleaire (CERN, the European Laboratory for Particle Physics) server or the National Center for Supercomputing Applications (NCSA) server. Web servers have been implemented for several different platforms, including the Sun Sparc II.TM. workstation running the Unix operating system, and personal computers with the Intel PENTIUM.TM. processor running the Microsoft MS-DOS.TM. operating system and the Microsoft Windows.TM. operating environment.
Web servers also have a standard interface for running external programs, called the Common Gateway Interface (CGI). CGI is described in more detail in How to Set Up and Maintain a Web Site, by Lincoln D. Stein, Addison-Wesley, August 1995. A gateway is a program that handles incoming information requests and returns the appropriate document or generates a document dynamically. For example, a gateway might receive queries, look up the answer in a database to provide a response, and translate the response into a page of HTML so that the server can send the response to the client. A gateway program may be written in a language such as "C" or in a scripting language such as Perl or Tcl or one of the Unix operating system shell languages. The CGI standard specifies how the script or application receives input and parameters, and specifies how output should be formatted and returned to the server.
For security reasons, a Web server machine may limit access to files. To control access to files on the Web server, the Web server program running on the server machine may provide an extra layer of security above and beyond the normal file system and login security procedures of the operating system on the server machine. The Web server program may add further security rules such as: (a) optionally requiring input of a user name and password, completely independent of the normal user name and passwords that the operating system may maintain on user accounts; (b) allowing groups of users to be identified for security purposes, independent of any user group definitions of the operating system; (c) access control for each document object such that only specified users (with optional passwords) or groups of users are allowed access to an object, or so that access is only allowed for clients at specific network addresses, or some combination of these rules; (d) allowing access to the document objects only through a specified subset of the possible HTTP methods; and (e) allowing some document objects to be marked as HTML documents, others to be marked as executable scripts that will generate HTML documents, and others to be marked as other types of objects such as images. Access to the on-line service document objects via a network file system would not conform to the security features of the Web server program and would provide a way to access documents outside of the security provided by the Web server. The Web server program also typically maps document object names that are known to the client to file names on the server file system. This mapping may be arbitrarily complex, and any author or program that tries to access documents on the Web server directly would need to understand this name mapping.
A user (typically using a machine other than the machine used by the Web server) who wishes to access documents available on the network at a Web site must run a client program called a "Web browser." The Web browser program allows the user to retrieve and display documents from Web servers. Some of the popular Web browser programs are: Navigator.TM. browser from NetScape Communications Corp., of Mountain View, Calif.; Mosaic.TM. browser from the National Center for Supercomputing Applications (NCSA); WinWeb.TM. browser, from Microelectronics and Computer Technology Corp. of Austin, Tex.; and Internet Explorer.TM. from Microsoft Corporation of Redmond, Wash. Web browsers have been developed to run on different platforms, including personal computers with the Intel Corporation PENTIUM.TM. processor running Microsoft Corporation's MS-DOS.TM. operating system and Microsoft Corporation's Windows.TM. environment, and Apple Corporation's Macintosh.TM. personal computers.
The Web server and the Web browser communicate using the Hypertext Transfer Protocol (HTTP) message protocol and the underlying transmission control protocol/Internet protocol (TCP/IP) data transport protocol of the Internet. HTTP is described in Hypertext Transfer Protocol--HTTP/1.0, by T. Berners-Lee, R. T. Fielding, H. Frystyk Nielsen, Internet Draft Document, Oct. 14, 1995, and is currently in the standardization process. In HTTP, the Web browser establishes a connection to a Web server and sends an HTTP request message to the server. In response to an HTTP request message, the Web server checks for authorization, performs any requested action, and returns an HTTP response message containing an HTML document in accord with the requested action, or an error message. The returned HTML document may simply be a file stored on the Web server, or may be created dynamically using a script called in response to the HTTP request message. For instance, to retrieve a document, a Web browser may send an HTTP request message to the indicated Web server, requesting a document by reference to the URL of the document. The Web server then retrieves the document and returns it in an HTTP response message to the Web browser. If the document has hypertext links, then the user may again select one of the links to request that a new document be retrieved and displayed. As another example, a user may fill in a form requesting a database search. In response, the Web browser will send an HTTP request message to the Web server including the name of the database to be searched, the search parameters, and the URL of the search script. The Web server calls a search program, passing in the search parameters. The program examines the parameters and attempts to answer the query, perhaps by sending the query to a database interface. When the program receives the results of the query, it constructs an HTML document that is returned to the Web server, which then sends it to the Web browser in an HTTP response message.
Request messages in HTTP contain a "method name" indicating the type of action to be performed by the server, a URL indicating a target object (either document or script) on the Web server, and other control information. Response messages contain a status line, server information, and possible data content. The Multipurpose Internet Mail Extensions (MIME) specification defines a standardized protocol for describing the content of messages that are passed over a network. HTTP request and response messages use MIME header lines to indicate the format of the message. MIME is described in more detail in MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies, Internet RFC 1341, June 1992.
The request methods defined in the current version of the HTTP protocol include GET, POST, PUT, HEAD, DELETE, LINK, and UNLINK. HEAD, DELETE, LINK and UNLINK are less commonly used and are described in more detail in the HTTP/1.0 draft specification cited above. The GET method causes the server to retrieve the object indicated by the given URL and send it back to the client. If the URL refers to a document, then the server responds by sending back the document. If the URL refers to an executable script, then the server executes the script and returns the data produced by the execution of the script. Web browser programs normally use the GET method to send request messages to the Web server to retrieve HTML documents, which the Web browser then displays on the screen at the client computer.
The PUT method, according to the HTTP specification, specifies that the object contained in the request should be stored on the server at the location indicated by a URL. However, most current server implementations do not follow this specification; instead, they simply handle all PUT requests through a single PUT script, which is generally undefined, and must be created by a service author. Web browsers generally do not use the PUT method.
The POST method sends data, usually the user input parameters from an HTML form, to the server. The POST request also contains the URL of a script to be run on the server. The server runs the script, passing the parameters given in the request, and the script generates an HTML output that is returned in the response to the client. In order for a client program to send arbitrary data to the Web server using the current HTTP protocol, the client program must use either the PUT method or the POST method, as these are the only two methods that allow such data transfer to the Web server. Web browsers generally use only the POST method and generally only for the purpose of sending data in connection with forms to be processed.
The combination of the Web server and Web browser communicating using an HTTP protocol over a computer network, as described above, is referred to herein as a web architecture. The web architecture described above is suitable for use in private LANs or on the Internet. A typical on-line service for use on a web architecture will now be described. This type of on-line service includes a Web server program running on a Web server machine, and a set of service files that characterize the on-line service and which are stored on the Web server. The service files include HTML documents, executable scripts or programs to dynamically produce HTML documents, and other files of service information that can be referenced and updated by the scripts and programs. The actual data and scripts that comprise a particular on-line service, including HTML documents and script programs, are generally stored on the server in a separate area designated for that service. Global information about the service is also stored, including data such as the name of the service, the name of the author, revision history, comments about the service, and authorization information. The end user of the on-line service uses a Web browser program on the client machine to send requests to the on-line service and to receive responses from the on-line service. All access by an end user of the on-line service to the service files is managed and controlled by the Web server program. For example, an on-line service might consist of a corporate homepage HTML document, with a link to a second document that is a form for searching the store catalog. The search form may have a "submit" button that causes a script to be run on the Web server, generating a list of product descriptions with prices that is then returned to the Web browser as an HTML document. Each of the HTML documents may have a link to a second script that collects and displays the items that have been ordered. The service also has configuration information, such as a list of authorized users of the service and their passwords.
FIG. 1 shows the steps for using an on-line service, as seen by the end user of the on-line service on the client computer. The end user starts a Web browser program in a step 10, and the program determines the URL of an initial document to display in a step 12. The initial document URL may be determined from a configuration file, or may be programmed into the Web browser, or entered by the user. The browser then sends an HTTP GET request to the Web server in a step 14, giving the URL of the desired document. The browser then waits for a response from the Web server in a step 16. In a step 18, the browser tests the response to determine if it indicates an error message. If the response message from the Web server indicates an error, e.g., if the requested document is not found, then the browser reports the error to the end user in a step 22. Otherwise the response message from the Web server contains the requested document, and the Web browser formats and displays the document on the screen in a step 20, according to the HTML language conventions. In either case, the browser waits for the user to enter the next command in a step 24. For example, the user may request to view a new document either by selecting a hypertext link to the document, by requesting the document from a list of previously visited documents, or by entering the URL of the document that was obtained by the user through some other means. The browser tests the user command to determine if the user is requesting a new document in a step 26. If so, processing continues at step 14, as noted above. If the user is not requesting a new document, the browser tests the command in a step 30 to determine if it is a request to exit the program. If so, processing stops. Otherwise the command is a local command that is handled by the browser in a step 28, without sending an HTTP request. The end user may use local viewing commands, such as commands to scroll through the document, or commands to search for a particular text string in the document. After the browser handles the local command, the browser again waits for the next user command in step 24, as already discussed.
FIG. 2 shows the operation of an on-line service as seen by the Web server program. When the server is started, it runs continuously, waiting to receive a command over the network connection from a client Web browser program in a step 40. The server tests the received command in a step 44 to determine if it is a GET request. If so, the server examines the URL contained in the request in a step 52 to determine if the URL indicates an HTML document that is stored on the server. If the URL does refer to a document, then that document is returned to the Web browser via an HTTP response in a step 58. Otherwise, the URL indicates a script stored on the server, and the Web server runs the script to produce an HTML document in a step 56, and the HTML document is returned to the Web browser as noted with regard to step 58. If the test of step 44 determines that the command is not a GET request, the server tests the command in a step 48 to determine if it is a POST request. If so, the server retrieves the parameters from the POST request in a step 54, which include the URL and parameters for the script. The server then runs the indicated script in step 56 to generate an HTML document, which is returned to the Web browser as described above in connection with step 58. After an HTML document is returned to the Web browser, processing continues at step 40. If the test of step 48 determines that the command is not a POST request, the server returns an error message to the Web browser in a step 50, formatted as an HTML document. The processing continues at step 40, and the server again waits for the next request to repeat the process.
On-line services such as those described above are in high demand. Unfortunately, the task of developing an on-line service is currently one that almost always requires extensive programming skill and much specialized knowledge. Thus, there exists a great need for tools to simplify the process of building an on-line service so that the process can be accomplished in less time, with fewer errors, and by a non-programmer. In some cases, software tools exist to help convert the data content for a service from a native format to the format required by the server, but these tools only address the conversion of data files. Many other facets of the process are not undertaken by tools currently available.
In order to construct an on-line service for the World Wide Web, an author must perform a combination of the tasks, including creating a new HTML document for hypertext employed by the on-line service, creating a new script used by the on-line service, retrieving and modifying an existing HTML document from the Web server machine, retrieving and modifying an existing script from the Web server machine, and storing an HTML document or script on the Web server machine so that the Web server program will have access to it.
There are several approaches known in the prior art for constructing documents and scripts usable by an on-line service, and performing the tasks noted above. During performance of the tasks discussed above, an author may need to view the connectivity of documents linked together by hypertext links. The document author may further need to navigate among linked documents while using an editing program.
One conventional method for navigating among linked documents is to simply make the links active--even during editing. Thus, a link may be followed simply by selecting it. However, this approach does not enable the entire network of links to be viewed simultaneously.
According to a second conventional method, a collection of linked documents may be represented by a web-like network of document icons connected by links. This method breaks down when the network becomes highly connected or even self-referential due to the density of information that must be presented.
A third conventional method improves somewhat on the second method. Under this third method, only a predetermined number of levels are shown. While an improvement over the second conventional method, the third method still does not solve the problem of representing the web in such a manner that links may readily be followed either forward or back from a document.