A. Field of the Invention
This invention relates to methods and systems for monitoring and reporting performance characteristics of a remote server within a network and, more particularly, to methods and systems for monitoring and dynamically reporting a status of system activity for a remote server within a distributed computing network.
B. Description of the Related Art
In the past, computers were massive stand-alone machines, incapable of communicating with each other. They were used simply as fast calculating machines in limited applications. However, the computers of today are typically much smaller and orders of magnitude faster than those of yesterday. Moreover, the computers of today are typically networked together so that they may efficiently communicate, share information, and become useful in a larger variety of applications.
In the context of such a networked computing environment, the idea of distributed computing and client/server relationships arose. In general, distributed computing is based on a distributed computing network that processes, stores, and handles diverse operations by different computers or nodes within the network. In one type of distributed computing environment, one of the computers may be considered a client while another computer in the network may be considered a server to that client. For example, a person (commonly referred to as a user) may use one computer merely as an access vehicle to the information and computing resources of the network while another computer operates as a server to provide such information or computing resources to the first computer. In this situation, the first computer is considered a client because it is provided with the information or computing resources by a server, e.g., the second computer.
Servers are often classified by their function. A xe2x80x9cfile serverxe2x80x9d is a type of server in the network that is dedicated to managing information within files stored on a memory storage device, such as a hard disk drive. A xe2x80x9ccompute serverxe2x80x9d is another type of server capable of running particular software programs for another computer (i.e., a client) instead of running the programs on the other computer directly. In some network installations, a server may perform both file server and compute server functions depending upon the network""s performance needs, hardware, and the costs involved.
Thus, it is known that using servers in computer networks is often more efficient and economical by allowing fewer but more powerful (and more expensive) computers to operate as servers and more numerous but less powerful (and less expensive) computers or terminals to operate. as access vehicles or user nodes. In this manner, the computing resources of the network can be better and more efficiently utilized with servers.
However, the use of servers is not without problems. Their use often leads to large numbers of users depending upon the servers being constantly available for file access and software execution. If a server becomes undesirably busy or overloaded or otherwise encounters performance problems, a system administrator responsible for the server""s network often quickly becomes the center of attention of users demanding correction of the situation. A busy or overloaded server or otherwise encounters problems can also critically disrupt the operations of a business. This can result in lost business, lost worker productivity, and a great deal of aggravation by the end user. Thus, timely maintenance and rapid diagnostic analysis of servers within a distributed computing network has become increasingly important to both users and system administrators to avoid costly and frustrating server down-time.
To address this problem and successfully maintain and diagnose operations with servers in a networked environment, users typically depend upon a system administrator to analyze historical server data, more specifically referred to as system activity information, on each server in the network. Server data is generally defined as any data related to the performance of the system. For example, system activity information (a type of server data) may include, but is not limited to, information on CPU utilization, disk buffer activity, input/output (I/O) activity, system calls, and memory swapping activities. An analysis of such information on a particular server collected over a period of time may provide an indication of performance for that server. Thus, users typically rely upon the system administrator to perform such an analysis.
While such an analytical process may eventually produce results indicating the status of a server, gathering such data on a network""s servers (e.g., server data) is usually undesirably long. In response to end user complaints, a dedicated system administrator must be engaged to analyze the potential problem on one or more servers. This normally includes having the system administrator accessing each of the servers, collecting data files on the network""s servers, and assembling these files in a central repository. If the network is very large and geographically spread out, this task can be time consuming, frustrating to the user, and costly to the network owner. Once this vast amount of data is assembled together, the data must then be read and further analyzed in an attempt to give an indication of performance for a server. Accordingly, the time it takes to gather and analyze the appropriate information by the system administrator can be undesirably long leading to increased response time to user""s performance requests on servers. This response time can be worse if the system administrator becomes inundated with numerous performance requests at the same time.
In addition to the undesirable response time usually associated with such a process, there are several other problems with such a reactive server maintenance and diagnostic analysis process. First, the process typically requires specialized training to gather the data, initiate any analysis, and interpret the results. System administrators must understand the nuances of many different operating systems, become fluent in networking protocols and have a firm understanding of the interaction with the server""s hardware. Furthermore, the process may not allow a user to independently conduct and quickly view the testing results. It usually requires intervention by a designated system administrator or someone specially trained to maintain the network. If the designated person is busy or otherwise unavailable, the user is unfortunately left without an understanding of what is happening on the network and, in particular, what is occurring on the server.
Accordingly, there is a need for a system within a distributed computing environment that efficiently allows monitoring and dynamic reporting of server status to a system administrator. Additionally, there is need for such a system for use by a user without the time associated with training technicians to gather and analyze server data and without the time associated with training users to interpret the data.
Methods and systems consistent with the present invention overcome the shortcomings of existing status reporting techniques by automatically collecting and downloading server data from each remote server in a network to a managing server so that a status output can be dynamically generated in response to a request.
Methods and systems consistent with the invention, as embodied and broadly described herein, describe a method for monitoring and dynamically reporting a status of a remote server. The method begins by downloading server data from the remote server to a managing server. The server data, such as system activity information associated with the remote server, indicates the status of the remote server and is typically collected on the remote server. The server data may be downloaded by periodically compiling system activity information associated with the remote server into a parameter file and downloading the parameter file as the server data. In more detail, the server data may be downloaded by collecting system activity information which is associated with at least one operational characteristic of the remote server. Periodically, the system activity information may be compiled into a parameter file representing the server data over a predefined time period. After the predefined time period, the parameter file may be downloaded to the managing server.
A database entry is updated based upon the server data and in response to downloading the server data. This is typically accomplished by processing the downloaded server data into appropriate parts of the database.
Once the database entry is updated, a request is received from a user node. The request may have one or more selections related to the remote server. Information is extracted from the database entry in response to receiving the request. The information is based upon the selections in the request. Once the information is extracted, an output, such as a graphical output file, is dynamically created from the information. The output provides the status of the remote server and is transmitted to the user node so that the status of the remote server is reported to the user node.
In more detail, the selections may be determined from the request. The determined selections identify the remote server from a group of network elements in a distributed computing network. The selections further identify a selected type of system activity information. Additionally, when extracting the information, the information is typically extracted because it relates to the remote server and the selected type of system activity information.
Furthermore, the extracted information is typically analyzed to determine the status of the remote server based upon the selected type of system activity information. This may be done to determine the status of the remote server over a selected time interval. Based upon this determined status, the output is dynamically generated, preferably as a graphical output file, representing the status of the remote server and preferably including trends related to the remote server.
In accordance with another aspect of the invention as embodied and broadly described herein, a system is described for monitoring and dynamically reporting a server status within a distributed computing network. The system includes a managing server, a remote server in communication with the managing server through the distributed computing network, and a user node also in communication with the managing server through the distributed computing network. Additionally, the managing server is coupled to a memory storage device having a database associated with the remote server. The remote server is operative to collect system activity information associated with the server status of the remote server. The managing server is operative to download the system activity information from the remote server over the distributed computing network and update the database stored in the memory storage device to reflect the downloaded system activity information. The user node is capable of generating a performance request related to the remote server while the managing server is able to receive the performance request from the user node. The managing server is also able to extract information from the database based upon a set of parameters of the performance request, dynamically create an output file in response to the performance request using the extracted information, and transmit the output file to the user node over the distributed computing network so that the server status of the remote server is reported to the user node.
In more detail, the remote server is typically operative to collect the system activity information at predetermined points during a defined time period, such as every minute during a day. In this situation, the system activity information is associated with at least one operational characteristic of the remote server, such as CPU utilization. The managing server is typically operative to download the system activity information from the remote server after the defined time period, such as the end of the day. Furthermore, the remote server may also be operative to periodically compile the system activity information into a summary file, which may be downloaded from the remote server by the managing server and then processed into the database on the memory storage device.
Upon receiving a performance request, the managing server may also be operative to determine the parameters of the performance request. These parameters, more generally known as selections, are portions of the performance request identifying the remote server from a group of network elements in communication with the managing server over the distributed computing network. These parameters also identify a selected type of system activity information,
The managing server may also extract information related to the remote server and the selected type of system activity information from the database, analyze the extracted information to determine the server status of the remote server, and generate a graphical output file as the output representing the server status. Furthermore, the managing server may be operative to generate trend information within the output. The trend information usually indicates performance trends related to the remote server.