Digital computer networks, such as the Internet, are now used extensively in many aspects of commerce, education, research and entertainment. Because of the need to handle high volumes of traffic, many Internet sites are designed using several groups of server computers. An example of a site network system is shown in FIG. 1A.
In FIG. 1A, network system 10 includes four major tiers. These are communications tier 12, web tier 14, application tier 16 and database tier 18. Each tier represents an interface between a group of server computers or other processing, storage or communication systems. Each interface handles communication between two groups of server computers. Note that the tiers are significant in that they represent the communication protocols, routing, traffic control and other features relating to transfer of information between the groups of server computers. As is known in the art, software and hardware is used to perform the communication function represented by each tier.
Server computers are illustrated by boxes such as 20. Database 22 and Internet 24 are represented symbolically and can contain any number of servers, processing systems or other devices. A server in a group typically communicates with one or more computers in adjacent groups as defined and controlled by the tier between the groups. For example, a request for information (e.g., records from a database) is received from the Internet and is directed to server computer 26 in the Web-Com Servers group. The communication takes place in communications tier 12.
Server computer 26 may require processing by multiple computers in the Application Servers group such as computers 20, 28 and 30. Such a request for processing is transferred over web tier 14. Next, the requested computers in the Application Servers group may invoke computers 32, 34, 36 and 38 in the Database Servers group via application tier 16. Finally, the invoked computers make requests of database 22 via database tier 18. The returned records are propagated back through the tiers and servers to Internet 24 to fulfill the request for information.
Of particular concern in today's large and complex network systems is monitoring the performance of, and optimizing, the system. One way that prior art approaches monitor system performance is to use a process at certain points in the network to report data back to a central location such as console 40. In FIG. 1A, the request for database records can be monitored by having a process at server 26 log the time and nature of the request. A process at server 20 then logs the time at which a request from server 26 is received. Similarly, server 32 (or whichever server receives the database request from server 20) logs its participation in the transaction. This “chain” of logged transactions is illustrated by bold arrows in FIG. 1A.
In this manner, the prior art monitoring system can determine how long it takes for a request for a record to propagate through the network. The transaction can also be tracked in the other direction to determine how long it takes to fulfill the request. The nature of such data logging is complex since a server in one tier, or group, may ask multiple other servers for assistance, or processing. Also, different servers can be asked at different points in time. The speed at which requests, processing and transactions occur can cause large amounts of data to be logged very rapidly. At some later time, the data is transferred to console 40. Console 40 acts to resolve the data and produce meaningful results about system performance that can be analyzed by a human administrator.
A problem with the prior art approach is that the logging processes are segregated and do little, if any, communication with each other. This means that complex dependencies among processes, servers, etc., are not accurately analyzed. The logging processes tend to create high overhead in the host servers in which they execute. One approach uses the console to poll the processes. Frequent polling of many processes also creates excessive overhead. Optimization and performance improvement based on the prior art approach is hampered by the use of disparate platforms and the lack of more encompassing analysis. Having to dump data to the console at intervals, and then have the data resolved, ultimately means that monitoring is not performed in real time.
Thus, it is desirable to provide a system that improves upon one or more shortcomings in the prior art.