The Internet, a vast public data and communications network, is becoming more and more important to both individuals and businesses. The Internet comprises a huge number of individual network nodes. Many of these nodes are consumers of information. Other nodes might also supply small amounts of information. Still other nodes are primarily suppliers or servers of information. The invention concerns the testing of such server systems.
Although there are many different Internet information providers, a few have become quite significant. For example, there are relatively few widely used search engines, and the available search engines process very high numbers of search requests. So-called “portals” comprise another category of information providers that have become prominent. Such providers deliver a variety of different information, often referring (or “linking”) users to other providers for the actual information.
The numbers of data requests or “hits” processed by these relatively few but prominent Internet nodes is staggering. As an example, the Microsoft family of Internet sites logs well over a billion hits daily.
The Internet uses a client/server model in which client software—known as a Web browser—runs on a local computer. Client machines effect transactions with Web servers using Hypertext Transfer Protocol (HTTP), which is a known application protocol providing users with access to files (e.g., text, graphics, images, sound, video, etc.). Returned data is formatted in accordance with pre-defined protocols, such as HTML, XML data, XML schemas, etc.
In the Internet paradigm, a network path to a server is identified by a so-called Uniform Resource Locator (URL) having a special syntax for defining a network connection. Use of an HTTP-compatible browser (e.g., Microsoft Internet Explorer) at a client machine involves specifying a link via the URL. In response, the client makes a request to the server identified in the link and receives in return a document formatted according to HTML. The Web server is usually a standalone file server or a server farm that services various Web document requests.
One of the problems that has arisen, and one which is particularly illustrated in the context of the highly prominent Internet sites mentioned above, is that information providers often need to scale their operations so that they are able to service high rates of requests without sacrificing reliability. One way of doing this is to incorporate multiple servers into a networked system. A collection of servers such as this is sometime referred to as a server “farm.” Each of the individual servers operates identically in order for the same services and responses to be rendered regardless of which server is used to render the services and responses.
It should be noted that many servers are doing much more than simply returning static text and graphics. Rather, many of the more popular Internet sites accept user-supplied information and dynamically compile responses based on the supplied information. This information often comes from sources other than the server itself, such as from a database server or even an image server that provides continuously updated images. Furthermore, requested data such as a weather forecast might change with time.
The reliability of both server software and hardware is very important in this environment. However, it is becoming increasingly difficult to effectively test highly scaled web server systems. A simple testing approach is to simply submit a few requests to the server system and manually verify that the responses are correct. However, it is now recognized that it is necessary to “stress test” server systems: to place them under a very heavy load of requests and to verify that the responses are correct when operating under this load.
When attempting to perform server testing at this level, it becomes necessary to automatically check whether responses are correct rather than manually verifying the content of each response. In the prior art, this is done by checking the headers of the responses. Each header contains a numeric “reply status” that indicates whether the server believes it has provided a correct response. Thus, incorrect responses are detected in the prior art by checking their reply status indicators.
The inventors have found shortcomings in this approach, and have devised more effective methods of verifying the correctness of responses during stress testing.