The present invention relates generally to the field of computer hardware performance monitoring and optimization, and more particularly to an automated rules-based system and method for detecting and predicting server hardware bottlenecks and recommending hardware upgrades to solve bottlenecks.
In the early days of computing, computer systems were standalone processors to which peripheral devices such as displays, printers, and input devices were coupled. Each computer system was independent and ran its own independent application programs. There was very little communication between computer systems.
Today it is well known to interconnect computer systems in complex computer networks, to share data, services and resources associated with the numerous computer systems that have access to the distributed computing environment. Networks range from small local area networks (LANs), to large wide area networks (WANs), to vast interconnected networks such as intranets and the Internet itself. In today""s computing environment, the work of computing is distributed between various machines in a client/server architecture. Clients provide a user interface and perform a portion of the work of the system. Servers respond to requests from clients for data, files, and actual computing services.
Servers are implemented in systems that include various hardware resources, including central processing units (CPUs), memory, disk drives, and communications interfaces or adapters. Server hardware resources are relatively expensive and configuring the correct amount of CPU power, memory, disk drives, and communications throughput on a network system is a complex task. Typically, the task of configuring the proper power and amount of server hardware resources is done on a hit or miss strategy. As the number of users increases or the workload of the server changes over time, bottlenecks are created within the server hardware resources resulting in slow performance. Since server systems are complex and their hardware resources are interrelated, it is not easy to determine the actual source or cause of a bottleneck.
Most modern operating systems provide server hardware resource utilization measurements. However, the system administrator must be skilled in interpreting this resource utilization information to identify system hardware bottlenecks. This is a complex task and often prone to error because utilization information is dynamic and longer term trends are more important for accurately diagnosing system hardware bottlenecks than short term peaks and fluctuations.
System bottlenecks can be caused by improper software design, improper software configuration, or excessive usage by one or more users. These problems usually create one or more hardware bottlenecks. Diagnosing the cause of a particular bottleneck is complex and often beyond the capabilities of the average system administrator. Often software redesign is painful and time consuming. Occasionally bottlenecks can be alleviated by simple software configuration changes. However, in most cases, a hardware upgrade is the least difficult and least costly modification to alleviate system bottlenecks.
The present invention provides a method of optimizing server hardware performance and predicting server hardware bottlenecks. The method monitors a server hardware utilization parameter and computes the average of the measurements of the utilization parameter over a selected time period. The method then compares the computed average to a threshold value for the measured utilization parameter. If the computed average is equal to or greater than the threshold, the method reports a performance bottleneck and provides a recommended solution for the bottleneck.
Preferably, the method of the present invention monitors a plurality of hardware utilization parameters, including, but not limited to, CPU utilization, memory utilization, disk queue depth or disk utilization, LAN byte throughput and LAN packet throughput. The method is preferably implemented with a rules base. The method applies a set of rules to utilization parameter averages to detect and report performance bottlenecks and make recommendations.
The method predicts a server hardware performance bottleneck by computing running averages of the measured server utilization parameter over selected time periods. The method uses a linear regression analysis to determine a trend in the running averages and compares the trend to a threshold value for the server utilization parameter to predict the occurrence of a performance bottleneck.