In 1989, the Counseil Europeen pour la Recherche Nucleaire (CERN) started the Internet revolution with the creation of a hypertext system to allow CERN nuclear physicists to easily transfer documents containing both pictorial and textual data. One of CERN's primary goals was to unify existing data transfer protocols under a consistent interface for accessing different document types across diverse network mediums. In 1991, CERN publicly released their hypertext system, and after another year of review and improvement, in 1993 nascent Internet browsers were developed. These browsers brought Internet connectivity to the general public, and ever since there has been a near exponential explosion of software and hardware development to provide all manner of browser-based data transfer.
Fundamental to the operation of browsers is that they (as do all World-Wide-Web applications) operate on a discrete client-server basis. A client, such as a web browser, sends a request to a server, such as a Hypertext Transfer Protocol (HTTP) based web server, and the server responds to the client's request. The system is discrete in that a stateless protocol is used. Instead of opening and maintaining a connection to the server, the client instead encapsulates all relevant data into the request, a connection is opened to the server, the request is sent, and the connection is closed. Similarly, in response to the request, the server opens a connection to the client, transfers the results of the request, and closes the connection again. The nature of discrete requests and responses allows for efficient data transfer, and efficient multiplexing of multiple client requests.
But the efficiency advantages of this stateless system quickly turned into a disadvantage for developers wishing to provide more capabilities, such as sending custom web pages to web browsers. It is advantageous to allow a server to send customized web pages in response to a client browser's connection, but this is difficult with a stateless protocol. Various methods have been developed to simulate a statefull process over a stateless connection.
One such method is to require the client to log in, allowing the server to lookup the client in a server database. Then each web page sent to the client is formatted on the fly to include hypertext links including an identifier indicating who is contacting the server. When the client selects a link, the server receives a request with embedded information identifying the client, allowing the server to track the client's actions.
But, it is inconvenient to require a client to log in each time. An alternative solution is to have the client retain an updateable token that can be associated with a server's network location (e.g., a web site), so that the token can be transmitted in lieu of multiple login requirements. This token is commonly referred to as a "cookie," and it can be passed between a client and server to allow server tracking of client activity.
A cookie is small, having a maximum size of about 4K. Cookies are designed to be transparently placed and retrieved from a client computer. In the context of a browser, every time a client contacts a particular network address, the client browser automatically transmits any cookies related to that network address. When sending a response to a contacting client, the server can update or set new cookies to be maintained by the client. This allows the server to track client activity.
Unfortunately, cookie usage suffers from several significant drawbacks. One such drawback is that cookies are limited in size, and only a limited number of cookies can be associated with any given network address; thus, there is a small finite limit on state information that the client can retain. Another drawback is that all cookie data is transferred to a server, even when irrelevant to a particular client transaction. This is due to a cookie transmission being based on the address contacted; for any given address, all client cookies related to that address are automatically sent to the server. This gives rise to significant unnecessary overhead. For example, if a server sets 20 cookies, each 4K in size, a client faces an 80K overhead in every communication with the server; this is compounded by a likely 80K overhead in any responses from the server, since if the client is maintaining its state in cookies, an updated-cookie will be sent by the server.
Although limited storage can be overcome by increasing cookie size, client-server communications would suffer an even greater transaction cost. Here, the term "cost" is used to generally express the time and data required to track client-server interaction-states. High cost is synonymous with having high overhead in communicating with a server. Such overhead is disadvantageous when clients communicate over saturated links.
To avoid the overhead of large cookie transmissions, an alternative approach is to have a server generate a unique (preferably short) client identifier, and to embed this id in a cookie set in a response to an initial contact by a client with the server. On subsequent client contact, the server receives the cookie with embedded id, allowing the server look up the client in a local clients database. As with the previous configuration, each client HTTP request to the server includes the client's relevant cookies, but the overhead is greatly diminished since only an id is transmitted, rather than the client's entire state information. Thus, the server can track what the client has been doing.
A problem with this method, however, is now the server is entirely responsible for tracking the client's state. This may appear a relatively minor burden since database technologies afford rapid access to client data. But, since network connections allow multiple clients to simultaneously contact a server, the server can be quickly overwhelmed with service requests.
(Further information about Internet cookies can be found in HTML & CGI Unleashed, Sams.Net Publ. (1995); Dynamic HTML Unleashed, Sams.Net Publ. (1998); as well as at Internet site http://developer.netscape.com/library/documentation/-communicator/ jsguide4/cookies.htm, and http://developer. netscape.com-/news/viewsource/archive/goodman_cookies.html.)
Thus, none of the prior solutions is particularly advantageous since they either entrust all state-tracking responsibility to resource-limited client storage, or to a possibly-overwhelmed server.
In accordance with an embodiment of the present invention, the foregoing drawbacks are overcome by caching in a client-stored cookie a globally unique client id (GUID) along with a core set of user data (such as preferences) generally applicable to the user's interaction with a server. To reduce overhead in transferring the cookie, preferences can be combined and compressed. By storing the unique id, a server is able to look up and track all of the client's data in a local database, and can retrieve data specific to a particular client request. And, by storing core data, such as page formatting and content preferences (hereafter "personalization settings"), the server can tailor the client's experience without having to incur the cost of database look-ups. Additionally, a cookie-version number can be embedded to indicate how a server should interpret an incoming cookie. These features improves site performance and stability.
In such an embodiment, when a client visits a front (entry) page of a site, entry code on that page checks whether the client's connection software (e.g., a web browser) provided a cookie containing the client's personalization settings. If the cookie was not automatically provided to the server (or the version number is too old), but the user has been assigned a GUID, the entry code calls a generation routine to generate a cookie. Note that if there are no cookies, browser serial number, or other identifier for the client, the client may be prompted to enter identification information in one or more dialog boxes. (It is understood that references to calling a routine includes any method of generating such a cookie, such as by a procedure, function, ASP file, Active X control, D/COM, Java, J/Script, D/HTML, plug-in, etc.) The generation routine is used to retrieve the relevant set of personalization settings from the database, build the compressed cookie, and pass it to the user's browser. Similarly, when the user completes a personalization session, or if the server wants to update the client state, a save settings routine is activated to update the client cookie.
If the client does not yet have any personalization settings, a cookie is created with value "0" (or some other value indicating an empty cookie). The server can then immediately redirect the client to a personalization routine so that a non-empty cookie may be defined, or personalization can be deferred. If deferred, on subsequent contact with the server the empty cookie will be transferred, indicating to the server that it needs to personalize the new client.
If the client does have personalization settings stored in a server database, a compressed cookie can be created by retrieving the core settings and setting them into a cookie for storage by the client.
A cookie can be formatted as a string containing multiple substrings, each delimited by a separation character (e.g., a "!" or other character). These substrings can contain data in different formats, such as simple strings, comma-delimited string lists, compact Boolean arrays, and encoded numeric arrays. Formatting choices are made to minimize data size, as well as to minimize data decoding time on the server. In some cases, this means the code stores data in non-canonical formats. ("Decoding time" refers to the time necessary to decompress stored data.)
For example, an entry page may be personalized to display a user's name. This data can be stored in a cookie as [First Name]{[space][Last Name]}[possessive suffix] (e.g., "John Doe's"). Therefore, the work to determine whether there is a last name, and what possessive suffix to use, is only done once at cookie creation time. (Note that although plain-text is shown in this example, the text representation can be compression-encoded into a shorter (but unreadable) ASCII sequence.) Similarly, a web page may provide other information (e.g., stocks, news, sports, etc.) that the client has requested be displayed on entry to the page. Preferences, such as which stocks to display, can be stored as a comma-delimited list, or in some other format for direct submission to a third-party information-retrieval system.
To reduce cookie size, a cookie creation routine encodes numeric data in the range 0-63 as a single character, the ASCII character corresponding to N+48. Base64 or other schemes can also be used to encode numeric data. However, N+48 is preferable because it can be quickly encoded and decoded, and it is not necessary to protect it against character transformation (i.e., the characters are interpreted on the same kind of system that generated them).
Cookie size is further optimized by representing configuration information as a compressed series of Boolean (yes/no) values. For example, one could use a single bit to track whether a client prefers to receive news listings at the top or the bottom of a personalized page. Other personalization choices can be concatenated into a long bit sequence, and the entire sequence then compressed. First, collections of Boolean values are grouped into 6-bit runs and then N+64 encoded. For example, (True, True, False, False, True, True) is mapped to 110011, which is .sup.51.sub.10. 51+48=99, which corresponds to the ASCII character `c`. Storing Boolean information in this manner is significantly more compact than storing strings such as "Sports=on" or by storing a Boolean string-array "110011".
A further optimization is to truncate superfluous delimiters corresponding to empty values. Thus, rather than creating a cookie containing "05!John Doe's!sea!01000!!KING!!!00000!00000!00000!*!*!*!*!*", instead only "05!John Doe's!sea!01!!KING" is stored. Such a reduction is possible when the cookie interpretation code assumes default values for unspecified information. Note that the leading "05" corresponds to version control. The cookie creation routine adds a version number to the beginning of the cookie, so that the cookie interpretation code can determine what format the cookie is in. This allows the server to utilize different format/compression schemes depending on the types of data the server expects to present to the client.
Other features and advantages of the invention will be apparent from the detailed description and accompanying drawings.