1. Technical Field
The present invention relates in general to a method and system for improving an Internet server's handling of client state objects. More particularly, the present invention relates to a system and method reducing the amount of unnecessary state object, or “cookie,” traffic between a client computer and the server computer.
2. Description of the Related Art
Some Internet web sites (i.e., servers) store client state information in a small text file, sometimes called a “cookie,” on the client's (i.e., user's) hard drive or in memory located on the client computer. Internet Browsers, such as Microsoft's Internet Explorer™ and Netscape's Navigator™, are often set up to allow the creation of these state objects. The user, however, can specify that a prompt be provided before a web site puts a state object on the user's hard disk or memory. In this manner, the user can choose to accept or reject state objects. The user can also configure the browser to prevent the acceptance of any state objects.
State objects are small data structures used by a web site to deliver data to a web client and store the data on the client's hard drive or memory. In certain circumstances, the client returns the information to the web site. Web sites can thus “remember” information about users to facilitate their preferences for a particular site. The web site may deliver one or more state objects to the client which are stored as flat files on the client's local hard drive or memory.
State objects contain information about the user and his or her preferences. For example, if the user inquires about a flight schedule at an airliner's Web site, the site might create a state object (i.e., a cookie) that contains the user's itinerary. Or it might only contain a record of which pages within the site the user visited, in order to help the site customize the view for the user during subsequent visits to the web site.
Only the information provided by the user or choices made by the user while visiting a Web site can be stored in a state object. For example, the web site cannot determine the user's e-mail name unless the user provides it. Allowing a Web site to create a state object, or cookie, on the client's computer does not give the web site, or any other web site, access to the rest of the client computer. In addition, only the site that created the state object is able to read it.
State objects are a general mechanism which server side connections (i.e., web sites) can use to both store and retrieve information on the client (i.e., user) side of the connection. The addition of a simple, persistent, client-side state significantly extends the capabilities of Web-based client/server applications. Web sites use state objects to simulate a continuous connection to the web site. This makes it more convenient for users by allowing them to visit pages within a site without having to reintroduce themselves with each mouse click.
HyperText Transfer Protocol (HTTP), is the underlying protocol used by the World Wide Web. HTTP defines how messages are formatted and transmitted, and what actions Web servers and browsers take in response to various commands. For example, when a user enters a URL (Uniform Resource Locator—the global address of documents and other resources on the World Wide Web) in a browser, an HTTP command is sent to the Web server directing it to fetch and transmit the requested Web page. The current HTTP protocol is “stateless,” meaning that the server does not store any information about a particular HTTP transaction; each connection between a client and a server is “fresh” and has no knowledge of any previous HTTP transactions. “State” information is information about a communication between a client and a server. In some cases it is useful to maintain state information about the user across multiple HTTP transactions.
When returning an HTTP object or other network information to a client, a server may include a piece of state information which is stored by the client. Included in that state object is a description of the range of URLs for which that state is valid. Any future requests made by the client which fall in that URL range will include a transmittal of the current value of the state object from the client back to the server. As described above, the state object is often called a “cookie,” for no compelling reason.
This simple mechanism provides a powerful tool which enables a host of applications to be written for web-based environments. Shopping applications can store information about currently selected items, for fee services can send back registration information and free the client from retyping a user-id on subsequent connections, and web sites can store per-user preferences on the client computer. These preferences can be automatically supplied by the client computer when the client subsequently connects to the server.
A cookie is introduced to the client by including a “Set-Cookie” header as part of an HTTP response; often this will be generated by a CGI script. CGI stands for “Common Gateway Interface,” a specification for transferring information between a World Wide Web server and a CGI program. A CGI program is any program designed to accept and return data that conforms to the CGI specification. The program,.could be written in any programming language, including C, Perl, Java, or Visual Basic.
Syntax of the Set-Cookie HTTP Response Header
This is the format a CGI script would use to add to the HTTP headers a new piece of data which is to be stored by the client for later retrieval.
Set-Cookie: NAME=VALUE; expires=DATE;
path=PATH; domain=DOMAIN_NAME; secure
Multiple Set-Cookie headers can be issued in a single server response.
NAME=VALUE
This string is a sequence of characters excluding semi-colon, comma and white space. This is the only required attribute on the Set-Cookie header.
expires=DATE
The expires attribute specifies a date string that defines the valid life time of that cookie. Once the expiration date has been reached, the cookie will no longer be stored or given out. Expires is an optional attribute. If not specified, the cookie will expire when the user's session ends.
The expires header lets the client know when it is safe to purge the mapping but the client is not required to do so. A client may also delete a cookie before its expiration date arrives, for example if the number of cookies exceeds its internal limits.
domain=DOMAIN NAME
When searching the cookie list for valid cookies, a comparison of the domain attributes of the cookie is made with the Internet domain name of the host from which the URL will be fetched. If there is a tail match, then the cookie will go through “path matching” to see if it should be sent (see description of “path,” below). “Tail matching” means that the domain attribute is matched against the tail of the fully qualified domain name of the host. A domain attribute of “acme.com” would therefore match host names “anvil.acme.com” as well as “shipping.crate.acme.com”. The default value of domain is the host name of the server which generated the cookie response.
Only hosts within the specified domain can set a cookie for a domain. Domains that store cookies have at least two (2) or three (3) periods in them to prevent domains of the form “.com”, “.edu”, and “va.us” from storing overly-broad cookies. Any domain that falls within one of the special top level domains (e.g., “.COM”, “.EDU”, “.NET”, “.ORG”, “.GOV”, “.MIL”, and “.INT”) requires at least two periods. Any other domain requires at least three periods.
path=PATH
The path attribute is used to specify the subset of URLs in a domain for which the cookie is valid. If a cookie has already passed domain matching, then path matching takes place wherein the pathname component of the URL is compared with the path attribute. If there is a prefix match, the cookie is considered valid and is sent along with the URL request. The path “/foo” would match “/foobar” and “/foo/bar.html”. The path “/” is the most general path and matches any path within the domain.
If the path is not specified, it as assumed to be the same path as the document being described by the header which contains the cookie. Setting the path to a higher-level value does not override other more specific path mappings. If there are multiple matches for a given cookie name, but with separate paths, all the matching cookies will be sent (see examples below). Instances of the same path and name will overwrite each other, with the latest instance taking precedence. Instances of the same path but different names will add additional mappings. When sending cookies to a server, all cookies with a more specific path mapping should be sent before cookies with less specific path mappings. For example, a cookie “name1=foo” with a path mapping of “/” should be sent after a cookie “name1=foo2” with a path mapping of “/bar” if they are both to be sent.
secure
If a cookie is marked secure, it will only be transmitted if the communications channel with the host is a secure one. Currently this means that secure cookies will only be sent to HTTPS (HTTP over SSL) servers. If secure is not specified, a cookie is considered safe to be sent in the clear over unsecured channels.
Syntax of the Cookie HTTP Request Header
When requesting a URL from an HTTP server, the browser will match the URL against all cookies and if any of them match, a line containing the name/value pairs of all matching cookies will be included in the HTTP request. Here is the format of that line:                Cookie: NAME1=OPAQUE_STRING1; NAME2=OPAQUE13 STRING2 . . .        
There are limitations on the number of cookies that a client can store at any one time.
If a CGI script wishes to delete a cookie, it can do so by returning a cookie with the same name, and an expires time which is in the past. The path and name should match exactly in order for the expiring cookie to replace the valid cookie. This requirement makes it difficult for anyone but the originator of a cookie to delete a cookie.
When caching HTTP, as a proxy server might do, the Set-cookie response header should never be cached. If a proxy server receives a response which contains a Set-cookie header, it should propagate the Set-cookie header to the client, regardless of whether the response was 304 (Not Modified) or 200 (OK). Similarly, if a client request contains a Cookie: header, it should be forwarded through a proxy, even if the conditional If-modified-since request is being made.