By now, almost everyone is familiar with the Internet and the World Wide Web (the Web). The Internet is a collection of interconnected communication networks that span the globe. Information content on the Internet is presented via pages, each page comprising a file that is stored on (or dynamically built by) a computer server that is coupled to the Internet and assigned a Uniform Resource Locator (URL), which is essentially an address on the Internet.
Web browsers are computer programs that enable one to access and view Web pages via direct addressing (typing the address of a Web page in an address field of the browser) and/or by hyperlinking, as is well known in the art. Netscape Navigator and Microsoft Explorer are two of the most common Web browsers in use today.
Hypertext transfer protocol (http) is the protocol used for transferring Web pages over the Internet. Servers are computers that form part of the Web and whose general purpose is to provide (or serve) information to other computers coupled to the Web. Those computers that are used to request and receive information via the Web from http servers are typically termed client machines or client computers.
On the Web, information is served in the form of Web pages written in HTML (HyperText Markup Language). Thus, for example, a retail Web site operator couples to the Internet via one or more http servers on which are stored a plurality of Web pages written in HTML programming language. In actuality, many Web pages are not actually stored in Web page format, but are dynamically constructed upon receipt of a request for the page.
The HTML code defines the manner of presentation of information on the client machine. The HTML code also typically includes the textual content of the page. Other types of content, such as images, audio, background, and multimedia are contained in separate, supplemental, files stored in the server which are referenced within the HTML code by HTML tags.
In a common example, a customer accesses a Web retailer's Web site from a desktop computer using a Web browser. The customer's desktop computer utilizing the Web browser software would be considered a client machine.
The Web browser requests a particular Web page using http in a manner well known to those of skill in the art. Upon receipt of the request for a particular Web page, the server system corresponding to the URL of the requested page serves the HTML code for that page to the client machine via the Internet.
Http is a connectionless transfer protocol. This means that each request for a Web page transmitted from a client to a server is completely freestanding and contains no information that relates that request to any other request. Thus, http itself has no provision for state information that would allow a server (or a client) to maintain historical information about a series of related http requests (e.g., consecutive requests for pages from a single Web site by a single client).
In many types of communication sessions between a particular client and a particular Web site, it may be desirable to associate http requests from a single client and maintain state information. For instance, at retail Web sites which, commonly use dynamically generated shopping cart pages to keep track of items being purchased by a particular client, maintaining state information is a necessity in order to keep track of the various products being added to the shopping cart by the user contained in different http requests. Countless other examples also exist. The term session will be used in this specification to refer to any group of requests for data from a network server system that one may wish to associate with each other. Typically, however, a session comprises requests from a single client machine to a single server system that are within a certain time period of each other. The concept of sessions is not limited to use on the Internet and http, but can be applied to any communication network using any protocol.
Accordingly, ways have been developed outside of the http protocol itself for maintaining such state (or session) information. One of the earliest ways developed for doing this was the use of cookies. Cookies are small pieces of data that a server sends to a client machine and that the client's Web browser knows to store in a designated cookie folder or in browser memory. Thereafter, when that client sends a http request for a Web page to that server, the client's Web browser software sends the cookies associated with that URL to the server. The cookie might contain any particular information that the Web site operator feels the need to have in order to better service its customers. As an example, many Web sites allow individual clients to customize Web pages, such as a daily, electronic, newspaper containing only those articles that meet certain criteria selected by the customer and which criteria are stored as part of a cookie. Cookies are a common way to allow the Web site operator to identify the particular client making a request so that the operator can then pull up the appropriate information associated with that client and deliver the customized Web page. Persons of skill in these arts will recognize that other mechanisms for storing state data and the like are known and used in the field. However, the use of cookies is probably the most ubiquitous of the various mechanism in use today.
The Javax.servlet.http.HTTPSession object in the Java programming language (commonly called HttpSession) is a newer way of maintaining state information at the server side. The Javax.servlet.http.HTTPSession object builds on cookies as well as some of the other means of tracking state data in a layer on top of the http layer. HttpSession is a portion of a Java servlet API (Application Program Interface). Java is a programming language developed by Sun Microsystems, Inc. expressly for use in the distributed environment of the Internet. It can be used to create complete applications that may run on a single computer or be distributed among servers and clients in a network. It can be used to build small application modules, known as applets, for use as part of a Web page. Applets make it possible for a Web page user to interact with a page. Applets are small programs that can be delivered to a Web browser as part of an HTML page. Web browsers that include a Java Virtual Machine (JVM) can run Java applets The applet can execute at the client side to provide dynamic content and/or allow for interactivity. For example, a Java applet can allow a user at a client machine enter data onto a form. Applets thus allow for dynamic Web pages and interaction between the user at the client machine and the downloaded Web page. Java and Java applets are platform independent.
An API is a specific method prescribed by a computer operating system or by another application program by which a programmer writing an application program can make requests of the operating system or other application.
A Java servlet essentially is a server-side equivalent of an applet. A Java servlet API provides Web developers with a simple, consistent, mechanism for extending the functionality of an http server and for accessing existing business systems, i.e., the application program with which the HTML code interfaces. Servlets are server and platform independent. HttpSession essentially is an object of a Java servlet API that accumulates state data. It is built using cookies (and/or other existing state data tracking techniques) and associates http requests with those cookies (and/or the particular data pieces used in other data tracking techniques).
For further information concerning HttpSession, Java servlet APIs and the other matters discussed above, reference can be made to the servlet 2.2 (or later) specification.
It is common for high traffic Web sites to divide the tasks of servicing requests in to a three tier system with a different server or plurality of servers to handle each tier. The first, front end tier is the http server that processes the http aspects of a transaction. The second tier is termed the application server. The application server handles the content specific processing for the transactions. For instance, in a retail Web site, the application server would process the actual data for a purchase, such as creating an invoice, creating a bill of lading, checking inventory to determine if the ordered item is in stock, checking the customer's credit card information and confirming sufficient funds, record keeping, etc. The third tier comprises database servers that store the data needed to process requests. Such databases may include, for instance, a database of inventory and a database storing the content that is used to dynamically build Web pages. Within each tier, a large volume Web site server system may have multiple, redundant, servers. Particularly, any given server can only service so many requests in a given period. If the Web site expects more traffic than a single server can handle, it simply maintains multiple servers which can serve the same content. In such situations, since http is a connectionless protocol, one request from a particular client can be directed to one application server while the next request from the same client machine might be directed to a different application server. Accordingly, a means must be provided for the various servers to access the session data developed by another, redundant server.
A common way of enabling such sharing of http session data is by use of a database server that is accessible to the plurality of application servers for storing session data. Particularly, an application server will store session data in local memory, but will also write a copy of the session data to the session database. If a different server services a request from a client, that different server can go to the database and read out the session data for the corresponding session.
Typically, the session data is updated in both the local memory and the database each time a request causes a change in the data. Particularly, the server updates the http session data in its local memory and also writes that data to the database after each request. Another method that has been used is herein termed manual update. With manual update, the servlet operator can explicitly, within the code, direct the server to write its locally stored session data to the database.
Eventually, all sessions end. For instance, the individual at the client side finishes his or her business with the Web site and either goes on visit another Web site or turns off his or her computer. The session data being maintained therefore must be invalidated at some point since it is stale data that is no longer of any value. The appropriate server-side application program may expressly make a determination as to when a session has ended. For example, a retail Web site might deem a session to have ended after a consumer checks out (and all of the business data processing needed to process the order has been completed). The appropriate application program may then expressly invalidate the session data stored in the database (among many other tasks not pertinent to the present invention that may be performed upon the closing of a session). Another common way for a session to end is for it to time out. Specifically, typically, the http server of the application server maintains a record of the time of the last http request in a session and, if period since the last request exceeds a particular threshold (herein termed the time out interval), the session is closed. At a minimum this would involve invalidating the session data in the local memory and the database and may also involve other tasks.
Traditionally, while the servers are up and running (e.g., processing http requests from client machines and writing to and reading from the http session database), invalidation testing of the session data in the session database is run in parallel. Particularly, at specified intervals, an invalidation test program wakes up and polls all of the sessions stored in the session database to determine if they have timed out. For instance, the invalidation test simply may entail, for each session stored in the database, reading the last access time and the time out interval (either or both of which may be an attribute comprising the session data itself), and compare the time out interval to the difference between the last access time and the current time. If the time out interval is shorter than that difference, the session has timed out and the test program invalidates the corresponding session data in the database.
The invalidation test for each session can involve at least one read from the session database and, if the session needs to be invalidated, at least one write (to flag the session data as invalid or delete it outright). Thus, invalidation testing entails a substantial amount of traffic at the database and substantially increases the load on the database. The additional traffic created by the invalidation testing of the session database can be particularly taxing on the system during those times of day when there already is high traffic in the server system due to a large volume of client machines accessing the server system.
Writing to the database is a particularly expensive process in terms of consumption of processing power and time. Accordingly, it is desirable to reduce the number of writes to a session database in order to conserve system resources.
It is an object of the present invention to provide an improved method and apparatus for invalidating http session data in a back-end database.
It is another object of the present invention to provide a method and apparatus for invalidating http session data in a back-end database that minimizes database traffic.
Further, it is an object of the present invention to provide a method and apparatus to avoid invalidating http session data in a back-end database during periods of high traffic.