Computer networks are monitored to ensure continuous availability, optimum performance, and the efficient use of network resources. However, the monitoring of large networks is both labor-intensive and costly where the network nodes are configured, monitored, and updated on a node-by-node basis.
Monitoring computer networks generally involves building, installing, configuring, operating, and updating specially designed computer software programs that test various network specific parameters such as available memory, response time, operational status, and similar parameters. The set of computer instructions that tests (e.g., monitors) a single parameter, such as available disk space, for example, is called a check.
The computer software programs mentioned above may contain one or more checks targeted to (or associated with) particular operating systems and/or particular software programs running within a particular operating system. For example, a Macintosh operating system includes parameters specific to that operating system that may be monitored. Thus, a software program that includes one or more checks targeted to a Macintosh operating system may be developed. Similarly, a software program that includes one or more checks targeted to a Microsoft Windows operating system may be developed. However, the checks targeted to the Macintosh operating system may be different than the checks targeted to a Microsoft Windows operating system.
Regardless of the operating system and/or software programs used, the prior art teaches building, configuring, and updating checks on a computer-by-computer basis. In a large network, or in instances where a large number of checks are to be applied, this approach is disadvantageous because the building, configuring, and updating of checks individually is costly, labor-intensive, and unnecessarily repetitive. The prior art does not leverage network commonalities (e.g., an operating system and/or computer programs) common to two or more network nodes and does not permit the simultaneous deployment of multiple checks to groups of network nodes.
FIG. 1 is a diagram of a computer network 100 illustrating the arrangement of servers 101-109 within the infrastructure's protocol, application, and database layers 110-112, respectively. A protocol is an agreed-upon format for transmitting data between two devices. The protocol determines the type of error checking to be used; the data compression method, if any; how the sending device will indicate that it has finished sending a message; and how the receiving device will indicate that it has received a message.
An exemplary protocol is TCP/IP (Transmission Control Protocol). TCP is one of the main protocols in TCP/IP networks. Whereas the IP protocol deals only with packets, TCP enables two network computers to establish a connection and exchange streams of data. TCP not only guarantees delivery of data, but also guarantees that packets will be delivered in the same order in which they were sent.
Database layer 112 illustratively includes three servers (network computers) 107, 108, and 109 that run database and database management software manufactured by Oracle Corporation of Redwood City, Calif., for example.
Database layer 112 includes information collected from a variety of servers and organized in such a way that servers 104-106 can quickly select a piece or pieces of data stored in one or more databases. The databases in database layer 112 may be traditional databases organized by fields, records, and files, or Hypertext databases, in which any object (e.g., an item that can be selected and manipulated) can be linked to any other object. Objects can include shapes and pictures that appear on a display screen as well as less tangible software entities. In object-oriented programming, for example, an object is a self-contained entity that consists of both data and procedures to manipulate that data.
Application layer 111 includes servers (network computers) 104-106, which run a particular operating environment, for example, SOLARIS™, a Unix-based operating environment developed by Sun Microsystems of Mountain View, Calif. An operating environment is the state of a computer, usually determined by which programs are running, as well as by basic hardware and software characteristics. For example, a program run in a Unix environment means that the program is run on a computer that has the Unix operating system installed. Another term for environment in this sense is platform.
Servers 104-106 in application layer 111 also run middleware, which is a software used to connect two otherwise separate applications. This allows users to request data from a database using forms displayed on a browser, such as Internet Explorer made by Microsoft Corporation, of Redmond, Wash. Use of middleware also enables the web server to return dynamic web pages based on the user's requests and profile. Servers 104-106 in application layers 111 may run Dynamo™, an application server middleware manufactured by Advanced Technology Corporation of Cambridge, Mass., that links web servers 101-103 with the database layer 112.
The specific architecture of network 100 may vary depending on a number of factors such as size, hardware constraints, software types, etc., but in an illustrative NT architecture, certain clusters (e.g. farms) of servers are allocated to performing certain tasks. There are several disadvantages associated with growing or scaling NT architecture. First, a computer added to (or deleted from) a particular cluster must be individually configured to duplicate the same functionality as other computers in that cluster. Thus, the computers within each cluster are not made part of a monitoring group that checks basically the same things on the computers comprising that group.
Because users do not consider the types of computers available, the types and levels of clusters involved, and what types of ways each level (or portion thereof) and/or each cluster should be monitored, before individual computers are configured, the checks are not defined at the cluster or operational level, as shown in FIG. 2. Instead, checks are defined at the individual computer level. Additionally, the checks are not simultaneously applied to all computers within that cluster or level. Instead, the checks are applied to each separate computer individually.
FIG. 2 is a flow diagram illustrating the conventional application of individual checks on a case-by-case, computer-by-computer basis, and will be described with reference back to FIG. 1.
At Time 1, server 101 is accessed, step 201. Then, LINUX™ checks 202 and Apache checks 203 are built and applied, steps 204, 205. LINUX™ checks 202 are individual checks specific to the LINUX™ operation system. Similarly, APACHE™ checks 203 are individual checks specific to the APACHE™ web server application program.
At Time 2, server 102 is accessed, step 206, and LINUX™ checks 207 and Apache checks 208 are built and applied, steps 209-210.
At Time N−1, application server 106 is accessed, step 211, and SOLARIS™ checks 212 and DYNAMO™ checks 213 are constructed and applied steps 214 and 215. N is an integer greater than 1.
At Time N, database server 107 is accessed, step 216 and ORACLE™ checks 217 are built and applied, step 218. ORACLE™ checks 217 are individual checks particular to Oracle database system.
As may be appreciated, building individual checks on a case-by-case, computer-by-computer basis consumes time and money and unnecessarily duplicates checks previously built and applied to other computers.