This invention is related to a method and system for providing coherency between files in a group of files retrieved over an Internet connection.
In the last few years, there has been an exponential growth in the number of users accessing and the number of users providing information over the Internet. In a typical example, an information provider generates or supports a set of information files and places these files on an Internet HTTP server. A Universal Resource Locator (xe2x80x9cURLxe2x80x9d) identifies the physical location of the document, i.e., the server on which it resides and its path and file name. Read-only access to Internet documents is provided to a client via the HTTP protocol.
To keep the HTTP protocol simple and lightweight, it has been designed to be stateless. However, recent work in the standards committee has introduced a state management mechanism based on xe2x80x9ccookies.xe2x80x9d Cookies are small data structures used by a web server to deliver state data to a web client user and request that the client store the information. The HTTP server supplying the cookie also adds the information about the domain and the subset of URLs for which the cookie is applicable. The client stores cookie data in one or more flat files on its local hard drive. When a client makes a request to an HTTP server for document in the set and domain identified by a cookie, then, along with the request, the client software also sends back the cookie. In this manner, web sites can xe2x80x9crememberxe2x80x9d information from one request to the next and simulate a continuous connection to that site. Cookies are conventionally used to record user-preferences, passwords, and the like, so that such information need not be entered by the user every time.
At many Internet web-sites, there are sets of documents which are updated on an ongoing basis. Often, the documents are part of a set of logically related documents, i.e., a xe2x80x9cdocument groupxe2x80x9d. Such groupings are defined by the information provider. Independent updates and accesses of such related documents can cause clients to receive inconsistent information, especially if they are accessing the documents while the update process is in progress. The problem of consistency arises when a group of related documents, such as chapters in a book with each chapter represented as a link in the Table of Contents page, are updated individually. If no preventative measures are taken, a client accessing the group may receive some files which are old and some which are new, resulting in an inconsistency in the information provided to the clients.
Various approaches have been taken to address this problem. In the first approach, no control over updates is exerted and the information provider accepts that some clients may get inconsistent information. In a second approach, the service is made unavailable for the period of time when the update is being done.
These approaches are acceptable in most informal situations where the updates are very infrequent and where clients will accept breaks in service during updates. However for some applications, it is essential that consistency of data seen by the client be maintained without disrupting the service. Thus, there is a need to perform on-line updates of documents such that the update guarantees consistency of data as seen by the client during a given logical session.
One newly developed technique relies on the notion of group consistency within a persistent HTTP connection. Under this consistency model, when a client accesses a group of interrelated documents within a single persistent HTTP connection, it receives a consistent version of all documents in the group, even if some of the documents are updated during the access interval. Access to the correct version of a file is provided by selectively updating and reloading the file server""s request Redirect data table. This technique is discussed in more detail in S. Rangarajan, S. Yajnik, and P. Jalote, xe2x80x9cWCPxe2x80x94A tool for consistent on-line update of documents in a WWW serverxe2x80x9d, Proceedings of the Conference on the World-Wide Web (WWW7), April 1998, Brisbane, Australia.
Although an adequate solution in some situations, this technique is restrictive in practice because even if a server is made aware of which documents are logically related, it cannot prohibit a client from opening a new persistent HTTP connection to retrieve some documents that belong to a group already being accessed through another active persistent connection and it cannot control when a persistent HTTP connection is closed by the client. For example, a client may access some documents that belong to a logical group, close the connection and open a new connection to retrieve the remaining documents in the group. Further, the information provider is limited to the single definition of a logical session.
Accordingly, it would be advantageous to provide a system and method for consistent update and retrieval of documents from an Internet server which supports a flexible definition of a logical session and which is not limited to consistent access only during a single persistent HTTP connection.
An HTTP cookie-based State Management Server (xe2x80x9cSMSxe2x80x9d) is used to provide for consistent update and retrieval of documents from groups of related documents available through an Internet web server. Each group of documents contains one or more files which are maintained by an information provider. Types of file groups include components of a software package available for download, chapters in a book, etc. Each group has a set of index paths, which are referenced by users seeking access to files in the group. The index paths for a group are mapped to the physical locations of the files which form the various versions of the group. The index paths and the file paths for various versions of a group are maintained in a registration table.
Whenever a group is to be created or a new version of a group is to be added to an Internet server, the information provider updates the registration table to indicates the various versions of a group and the names and locations of the files which are members of each version of the group. This is preferably done by way of a Group Specification User Interface (xe2x80x9cGSUIxe2x80x9d) program. The SMS is configured to retain a copy of the current registration table in memory, i.e., by reading the registration table in response to an interrupt from the GSUI indicating that the table has been updated.
Client access to the system is provided through a conventional HTTP server which invokes a program, such as a Common Gateway Interface (xe2x80x9cCGIxe2x80x9d) script, that interacts with the SMS. A client requests a file by accessing an HTTP server and identifying the desired document by a URL containing the document""s index path. According to one aspect of the invention, the requested URL does not reference a file in the group directly, but instead references the file by its index path. If the client has previously accessed a document in the group, a cookie generated by the SMS and which is associated with the referenced URL and contains state information related to the prior access will have been provided to the client and locally stored. Through the normal operations of the client software, this cookie will also be provided to the server.
Upon receiving a client request, the HTTP server calls the CGI program and passes as parameters the URL of the requested document and the accompanying cookie, if provided by the client. The CGI program establishes a connection with the SMS if one has not been established, by mechanisms such as a socket connection or region of shared memory, and forwards the path of the requested URL and the cookie that it received from the HTTP server. In response, SMS examines the Registration Table and the data contained in the cookie to determine the proper group of documents to be mapped to the received file request. The path identifying the location of the correct document is then returned to the CGI program. The SMS also returns a modified cookie which contains state information reflecting the present access and group version for the existing logical session. If no cookie was provided, the SMS defaults to mapping the request to the most recent version of the group and generates and returns a new state cookie reflecting this access. The CGI program in turn prepares a partial header with the updated cookie information and sends this header, along with the mapped path of the document which is to be returned to the client, back to the HTTP server. The server then completes the header for the reply, includes the mapped document, and sends the reply, with the document and updated or new cookie, to the client.