The present invention relates to the design, evaluation, and sizing of computer systems.
The performance of a computer system in executing a particular application is determined by the complex interactions of its component parts. An application is a set of instructions executed in a particular sequence (but not always the same sequence), referencing data in a variable pattern. The particular hardware on which the application runs provides a variety of components that the application accesses in variable and complex manners to form the whole system.
Therefore, in order to evaluate the performance of a computer system as a whole, designers typically focus on individual subsystems independently, understanding the performance of each subsystem and then evaluating its interactions with other subsystems. Any one subsystem that is receiving excessive use can cause a bottleneck for the entire computer system, thereby creating unacceptable user response times. Accordingly, the goal of computer system design and sizing is to create a computer system whereby all of the individual subsystems meet or exceed the capacity necessary to provide acceptable user response times. General categories of computer subsystems include: processor/cache; memory; input/output (xe2x80x9cI/Oxe2x80x9d); operating system data structures; networking; etc.
The necessary subsystem capacity for a particular application has typically been determined based on empirical data derived from observing steady-state behavior of the system. Steady state refers to when the computer system has generally reached equilibrium in terms of the number of application users. Computer systems, however, do not always operate at steady state. Rather, computer systems must first xe2x80x9cramp-upxe2x80x9d to a steady-state number of users. For example, the computer system at a bank may host a thousand users during the majority of the working day, but the number of users may increase from zero to one thousand from 8:00 a.m. to 8:30 a.m., as employees arrive for work. The period between zero (or low) usage to steady-state usage is referred to as the xe2x80x9cramp-upxe2x80x9d period.
Sizing computer systems based on only steady-state information can result in serious errors because it is based on the assumption that the maximum utilization for the computer system (and its individual subsystems) occurs during steady-state operation. That assumption is correct only if the per-user utilization of the computer system is the same or less than utilization during steady-state operation. For many applications, however, per-user utilization of the computer system (and/or individual subsystems) is greater during ramp-up than during steady-state operation.
For example, as shown in FIG. 1, the I/O queue length (number of instructions waiting to be executed by the I/O subsystem), increases exponentially during ramp-up for many applications. This phenomenon is caused by increased I/O activity per user when users first activate an application (e.g., when users first access a web browser, significant I/O activity is created because the application must download outside libraries, specific data structures, etc.). As shown in FIG. 1, the increase in per user I/O activity causes the peak utilization of the I/O subsystem to occur during ramp-up 100, not during steady-state operation 110. Accordingly, an I/O subsystem sized for steady-state operation will cause unacceptable customer response times during ramp-up due to an I/O bottleneck.
This problem is often ignored by salespeople and engineers charged with designing a custom computer system for a particular application, thereby resulting in computer systems that are undersized and perform poorly during ramp-up. Alternatively, those computer system designers who have become aware of the problem will sometimes use a xe2x80x9cfudge factorxe2x80x9d to estimate the additional system capacity necessary during ramp-up for certain applications. For example, some system designers will configure a system based on steady-state data and then increase the capacity of particular subsystems by 50% to account for increased utilization during ramp-up. This xe2x80x9cfudge factorxe2x80x9d method, of course, can be extremely inaccuratexe2x80x94often resulting in oversized computer systems that are a waste of the customer""s money or undersized computer systems that perform poorly during ramp-up.
No method or system for accurately compensating for increased utilization during ramp-up has previously been developed. The reasons for this are apparentxe2x80x94time and money. Computer systems (and subsystems) have typically been sized based on empirical data derived for a particular configuration running a particular application. That empirical data is derived using a benchmark analysis, wherein the application is run (or modeled by computer simulation) for each potential configuration assuming a certain number of users during steady-state operation. For example, in order to determine the steady-state I/O utilization in a certain configuration for a particular application, a system designer performs a benchmark analysis. The benchmark analysis indicates the amount of I/O activity per user during steady state (i.e., (I/O activity during steady-state)÷(number of users in benchmark analysis)). The steady-state I/O requirements for a different number of users than used in the benchmark analysis can then be calculated linearly for each configuration (i.e., (I/O activity per user)xc3x97(number of users)).
Taking into account the ramp-up period makes this empirical approach much more difficult because, as discussed, per-user utilization is often different during ramp-up than during steady-state. That fact makes a linear extrapolation from a single calculation of per-user utilization impossible. Moreover, different customers have varying ramp-up periods, which can also significantly affect per-user utilization of system resources. In general, shorter ramp-up periods cause certain subsystem queue lengths to grow more rapidly. Accordingly, in order to size a computer system accurately using the typical benchmark-analysis approach, one would need to test, for each application, all possible ramp-up periods against all possible system (and subsystem) configurations using varying numbers of users. The resulting test matrix would be immense and impractical, especially considering that benchmark analyses often take hours to run.
Perhaps for this reason, ramp-up periods have often been ignored in computer-system sizing. However, ramp-up can significantly affect a customer""s perception of the system""s performance. Accordingly, what is needed is a system and method for accurately sizing computer systems that does not require massive numbers of benchmark analyses and avoids the severe undersizing and oversizing errors inherent in the previous systems.
The present invention provides a system and method for accurately sizing a computer system and its component subsystems. The present invention requires only limited benchmark analysis to design varying configurations, having varying numbers of users, and accounting for varying ramp-up periods. In addition, the the severe undersizing and oversizing errors of previous systems are avoided without requiring repeated trial and error benchmark analyses.
According to a preferred embodiment of the present invention, diagnostic data for a known subsystem configuration B is derived from a single benchmark analysis. From that diagnostic data a xe2x80x9cterminal queue length,xe2x80x9d QBL, for configuration B can be derived. Terminal queue length, as used herein, refers to the maximum subsystem queue length that can be permitted during ramp-up, beyond which application-user response time is unacceptable. Once QBL is empirically determined from the diagnostic data, it can be used to determine, without further benchmark analysis, the necessary throughput for a xe2x80x9ctarget configuration,xe2x80x9d A. Target configuration A is the subsystem configuration being sized and can be designed for a different number of users and/or a different ramp-up period than configuration B.
The basic method of the present invention requires: (1) obtaining empirically derived diagnostic data for a known configuration, B, of a computer subsystem, the configuration B accommodating a number of users, NB, and a ramp-up period, RB; (2) calculating, based on the diagnostic data for known configuration, B, the necessary throughput for target configuration, A, the target configuration A accommodating a number of users, NA, and a ramp-up period, RA, and providing an acceptable application-user response time during RA; and (3) configuring the target configuration A.
The basic system of the present invention includes: a storage device and a processor operatively connected to the storage device, wherein: (1) the storage device: (a) stores a program for controlling the processor; and (b) receives empirically derived diagnostic data for a known configuration, B, of a computer subsystem, the configuration B accommodating a number of users, NB, and a ramp-up period, RB; and (2) the processor is operative with the program to: (a) calculate, based on the diagnostic data for known configuration, B, the necessary throughput for target configuration, A, of the computer subsystem, the target configuration A accommodating a number of users, NA, and a ramp-up period, RA and providing an acceptable application-user response time during RA; and (b) configure the target configuration A.
The invention is described in greater detail with regard to the following drawings.