HyperText Transfer Protocol (HTTP) is used by the World Wide Web to define how messages are formatted and transmitted, and to direct the actions of web servers and browsers in response to various commands. For example, when a user enters a Uniform Resource Locator (URL) into a browser, an HTTP command is sent to the web server directing it to fetch and transmit the requested web page.
HTTP uses a client-server model. An HTTP client, such as a web browser, opens a connection and sends a request message to an HTTP server, such as a web server within a source web site, which then returns a response message, usually containing the resource that was requested. Thus, in itself, HTTP is a “stateless” protocol, i.e., it does not provide for maintaining a “session” as a user requests and interacts with various resources. Each HTTP request for a web page is generally independent of other requests. After delivering the response, the web server closes the connection, and does not retain transaction information. Each client—server connection is fresh, containing no knowledge of any previous HTTP transaction.
Internet protocols and standards provide some support for “state” information, which is information that associates individual data packets with clients and with prior network activity, assigned priority information, service class levels, and the like. “State” refers to configuration, attributes, condition, or information content. The state of a system is usually temporary and volatile, as it changes with time and will be lost or reset to some initial state if the system is switched off. One standard supporting state information specifices a limited mechanism for the exchange of state information in which two HTTP headers called “set-cookie” and “cookie” indicate an HTTP packet that includes state information contained in the payload portion. Browser software that recognizes these headers is enabled to extract the state information and save it in a local data structure referred to as a “cookie.” Depending on the site architecture, session ID information could be passed through the web server using various other data structures including the URL or form fields.
“Cookies” are the most common session managing method. Cookies can contain any information the server chooses to put in them and are used to maintain state between HTTP transactions, which are otherwise stateless. Cookies are information files for recording information sent from a web site to hardware such as a disk drive or the like in the client system. At the beginning of a session, the web site issues identification information, such as a session ID, to the client, and the browser at the client end records the identification information into a cookie. When the client accesses the web page that issued the cookie again, the information saved in the cookie is sent to the web site. This enables the web site at the server end to implement session management or customization to individual users by using the cookie information.
Another known technique for session management without using cookies implements session management by passing information as part of the URL. An example is a method for passing on session information as a parameter. A session ID is generated at the login, and this session ID is redirected to a first page as a parameter, and retained as the user moves from one page to another. The server receives the session ID passed as a parameter, and a server-side program dynamically creates a page including a hyperlink with the embedded session ID. Because the hyperlink in the page includes the session ID as a parameter, the session ID is passed on as the user moves to another link. In this manner, a unique session ID is held along a series of link-to-link movements, which makes it possible to manage users by referring to the session ID whenever necessary.
In another prior art method, when a browser sends a fresh request for a URL to a proxy server to access information on the web, the proxy server checks whether the browser is capable of handling cookies. The proxy server then finds the requested URL and removes any cookies introduced by the web site. The cookies are stored for future use. The proxy server then appends the browser's session ID to all of the links in the responsive URL, and sends the responsive page to the browser. This method therefore removes cookies and adds the session ID to the URL to maintain the state connection.
Mechanized search engines employ software agents (variously known as “robots”, “crawlers,” “spiders,” “bots,” “web wanderers,” or “automated site searchers”) to crawl (send HTTP requests) through web sites gathering URLs and other information such as the text of pages. The information gathered by the search engine agent is stored in the search engine's databases and indexed. Search engine “index servers” contain information similar to a book's index—a list of web pages that contain the words matching a particular user query.
Most search engine agents do not accept any cookies. Furthermore, adding the session ID to the URL introduces two problems for search engines. First, since the search engine index server would include the session ID as part of the page identification, it marks the same page as distinct for each session visit but not having unique content. Some search engine index servers may even tag the page as potential SPAM, since the content of each session page is (or is nearly) identical. Second, the indexed search would attempt to return each visitor to the site with the same session identification, causing the undesirable effect of commingling consumer data. Therefore there is a need for a method and system to overcome these shortcomings. In particular, it is highly desirable to do so without requiring extensive reprogramming of the web site's applications.
Web architects and designers have developed methods for maintaining “state” information for the duration of user interactions with server resources. The architecture of many web servers requires the ability to retain information between requests, when the systems become inactive. For dynamic web sites that customize a web page for individual users or contain a shopping cart function, it is especially critical to maintain state information about the user across multiple HTTP transactions.