The worldwide network of computers commonly known as the "Internet" has seen explosive growth in the last several years. Mainly, this growth has been fueled by the introduction and widespread use of so-called "web" browsers, which allow for simple graphical user interface (GUI)-based access to network services such as E-mail, news, file transfer protocol (ftp), web pages, etc. Many people contract with an internet service provider (ISP) to obtain access to the Internet. Subscribers to an ISP typically use a personal computer and modem to connect to the ISP using the public switched telephone network. Once connected, the user may perform the desired functions.
In addition to providing a connection to the Internet, ISPs or other computer access providers (CAPs), such as corporate IT departments, often provide additional services that expand, enhance, or improve internet functions. For example, many CAPs provide users with the ability to send and receive E-mail. Or, the CAP may provide a local domain name server (DNS) to speed the resolution of the domain names the subscriber is trying to access, thereby improving overall access speed.
The customers of these services tend to view the quality of that service in simple terms: accessibility and performance (i.e. speed, responsiveness, etc.) Unfortunately, the accessibility and performance of a service may depend on many factors. First, there is the service itself and the servers that implement the service. These servers may be comprised of the server software, the hardware running the server software, the operating system running on the hardware, and the network hardware and software that support the implementation of that service. Finally, the performance and accessibility of the server components may further depend on other services, hardware, software, etc. For example, the quality of E-mail service may first depend on the hardware and software running the E-mail program. This E-mail program may depend on a DNS server and a network router. The DNS may be used to resolve domain names before the E-mail can be sent, and the router may be used to relay the E-mail from the CAP's local network to the Internet backbone. Finally, the performance of the DNS server may depend on the performance of a network file system (NFS) server and several other pieces of hardware, software, or services provided by the same, or different hardware and software. Each of the components that contribute to the performance of the E-mail service are interrelated and may be located on the same or different networks or hardware, rely on the same or different software and operating systems, or be running on the same, or different, hardware.
It can be seen from the previous discussion that the simple quality of service measures of accessibility and performance may depend on the interrelationships of many hardware and software components arranged in a complex system infrastructure. It is also likely that individual CAPs will have an infrastructure comprised of a unique arrangement of components and their interrelationships. This makes it difficult to construct a "one size fits all" solution to conduct service and capacity planning and to detect, isolate, and resolve faults and quality of service problems.
Many CAPs manage their networks and services on a rather ad hoc basis. Collections of management scripts available in the public domain and policies and procedures developed on the fly combine to provide what little proactive measurement and monitoring of the infrastructure there is. Detailed knowledge of the infrastructure, relationships, test and measurement techniques, policies and procedures are often passed around the CAPs staff by word of mouth. Relationships between all the infrastructural components are usually only understood by the most senior technical operations staff. Finally, changes in operational procedures and policies are usually initiated only after hard won experience in dealing with failures and quality of service problems have been internalized by the operations staff. This period of internalization with its associated failures and poor service can adversely affect a CAPs reputation, and cost a CAP customers, market share, and revenue.
Accordingly, there is a need in the art for a system that captures the knowledge and experience of the senior technical operations staff and make that information available to a much wider audience. Such a system should to be able to gather data from a variety sources and tools that test infrastructure elements, collect data from SNMP MIBs and log files and correlate that data into the information needed to enable less skilled member of the operations staff to detect, isolate, and resolve faults and quality of service problems. There is a need in the art for a system that includes the detection of potential problems before they create a failure, are detected by users, or result in a quality of service problem. There is a need in the art for a system that allows less skilled members of the operations staff to diagnose, isolate, and resolve failures and quality of service problems without consulting the senior technical operations staff. Finally, it would be desirable if such a system could configure itself automatically and deploy the tools and test elements necessary for problem detection, isolation, and resolution.