As the Internet has grown from a few servers controlled by the government and a few educational institutions into a vast, heterogeneous network of servers and clients, the demands on servers and a corresponding interest in computer security have grown as well. As a result, servers have become more and more specialized, and networks have become more efficient, providing limited functionality to limited sets of users. In the interests of computer security, the sensitive information stored on servers has also been moved farther away from those users, and requests to access or manipulate that data must often pass through specialized tiers of servers before communicating with the machines actually carrying the data. These pipelined networks of servers allow efficient, non-duplicative access by multiple users, and ensure that users do not have prohibited direct access to important data.
One example of such a pipelined server network is used for the deployment of remote-access technologies. Typically, a group of functionally similar remote access servers accepts dial-up or virtual private network (VPN) connections from users desiring to remotely access an intranet. Before granting the users access to the network, these remote access servers (RAS), comprising the first tier, use the remote authentication dial-in user service (RADIUS) protocol to communicate with RADIUS servers handling authentication and authorization requests. This second tier of RADIUS servers communicates with domain controllers (DCs) in the pipelined server network in order to verify the users' credentials and uses lightweight directory access protocol (LDAP) to retrieve user and group settings from the DCs.
To make matters more complex, these remote-access deployments include RADIUS proxy servers for performing load-balancing and fault avoidance functions between the RAS servers and the primary RADIUS servers. Thus, in these pipelined server networks, users are separated from credentialing settings by four tiers of servers: RAS servers, which communicate with RADIUS proxy servers, which communicate with RADIUS servers, which in turn communicate with DCs.
The architectural complexity of these pipelined server networks makes it difficult to troubleshoot and diagnose errors. This difficulty is due to several factors, including the variety of systems involved in a typical transaction, the many possible routes taken by a given request and the many possible points of failure, as well as the sporadic and often irreproducible nature of the errors. Existing diagnostic tools are unable to adequately troubleshoot such complex server networks and, in particular, are limited in their ability to pinpoint a typical problem. Thus, a system administrator faced with the task of troubleshooting a network failure confronts a tedious, time-consuming and often intractable problem.
In the remote access deployment described above, a system administrator attempts to diagnose a fault by following a series of manual troubleshooting steps, which are aided by tools that often require a manual interface for any cooperation among the tools. Error messages received by a user unable to authenticate through the RAS server give no troubleshooting information, nor do the event logs. The system administrator may be able to obtain some troubleshooting information if he/she is manually monitoring network traffic. However, network monitoring is a clumsy tool, often providing a limited buffer within which to store data (making it necessary for the system administrator to investigate the error near simultaneously with its occurrence), and making it difficult to effectively filter data.
If the administrator cannot determine which RADIUS proxy server and RAS server formed the pipeline for processing a user inquiry that generated an error, the event logs and error logs on every RADIUS proxy server in the network must be manually reviewed. In the event that event or error logs have information describing the pipeline connection between a RADIUS and RADIUS proxy server, the administrator is able to just check the event logs and error logs on the identified RADIUS server, but he/she may further need to troubleshoot the accessed domain controller (DC). These local event and error logs, however, like the network-based monitoring tools available, suffer from over-inclusiveness and provide only limited buffering capabilities before old entries are overwritten.
There is a need for an automated error detection and diagnostic system that allows network errors in complex architectures such as pipelined server networks to be isolated. Such a system would free the system administrator from the tedious and unreliable task of manually tracking down network errors, and preempt the need for immediate administrator attention.