1. Technical Field of the Invention
The present invention generally relates to multiprocessor computer systems. More particularly, and not by way of any limitation, the present invention is directed to a system and method for automatically tuning a multiprocessor (MP) computer system in order to increase its performance.
2. Description of Related Art
Conventionally, a multiprocessing system is a computer system that has more than one processor, and that is typically designed for high-end workstations or file server usage. Such a system may include a high-performance bus, huge quantities of error-correcting memory, redundant array of inexpensive disk (RAID) drive systems, advanced system architectures that reduce bottlenecks, and redundant features such as multiple power supplies.
In the most general sense, multiprocessing may be defined as the use of multiple processors to perform computing tasks. The term could apply to a set of networked computers in different locations, or to a single system containing several processors. As is well known, however, the term is most often used to describe an architecture where two or more linked processors are contained in a single or partitioned enclosure. Further, multiprocessing does not occur just because multiple processors are present. For example, having a stack of personal computers in a rack is not multiprocessing. Similarly, a server with one or more “standby” processors is not multiprocessing, either. The term “multiprocessing” is typically applied, therefore, only to architectures where two or more processors are designed to work in a cooperative fashion on a task or set of tasks.
There exist numerous variations on the basic theme of multiprocessing. In general, these variations relate to how independently the processors operate and how the workload among these processors is distributed. In loosely-coupled multiprocessing architectures, the processors perform related tasks but they do so as if they were standalone processors. Each processor is typically provided with its own private memory and may have its own mass storage and input/output (I/O). Further, each loosely-coupled processor runs its own copy of an operating system (OS), and communicates with the other processor or processors through a message-passing scheme, much like devices communicating over a local area network. Loosely-coupled multiprocessing has been widely used in mainframes and minicomputers, but the processing software's architecture is closely tied to the hardware design. For this reason, among others, it has not gained the support of software vendors and is not widely used in today's high performance server systems.
In tightly-coupled multiprocessing, on the other hand, operation of the processors is more closely integrated. They typically share main memory, and may even have a shared cache. The processors need not be identical to one another, and may or may not perform similar tasks. However, they typically share other system resources such as mass storage and I/O. Additionally, instead of a separate copy of the OS for each processor, they run a single copy, with the OS handling the coordination of tasks between the processors. The sharing of system resources makes tightly-coupled multiprocessing platforms somewhat less expensive, and it is the dominant multiprocessor architecture in the business-class servers currently deployed.
Hardware architectures for tightly-coupled MP platforms can be further divided into two broad categories. In symmetrical MP (SMP) systems, system resources such as memory, disk storage and I/O are shared by all the microprocessors in the system. The workload is distributed evenly to available processors so that one does not sit idle while another is heavily loaded with a specific task. Further, the SMP architecture is highly scalable, i.e., the performance of SMP systems increases, at least theoretically, as more processor units are added.
In asymmetrical MP (AMP) systems, tasks and resources are managed by different processor units. For example, one processor unit may handle I/O and another may handle network OS (NOS)-related tasks. Thus, it should be apparent that an asymmetrical MP system may not balance the workload and, accordingly, it is possible that a processor unit handling one task can be overworked while another unit sits idle.
SMP systems are further subdivided into two types, depending on the way cache memory is implemented. “Shared-cache” platforms, where off-chip (i.e., Level 2, or L2) cache is shared among the processors, offer lower performance in general. In “dedicated-cache” systems, every processor unit is provided with a dedicated L2 cache, in addition to its on-chip (Level 1, or L1) cache memory. The dedicated L2 cache arrangement accelerates processor-memory interactions in the multiprocessing environment and, moreover, facilitates higher scalability.
Regardless of the various architectural variations discussed in the foregoing, the performance of an MP computer system is significantly dependent on how the various processors are loaded with respect to servicing the I/O devices associated with the system. As is well known, the processors service the I/O devices by executing the interrupt service routines (ISRs) corresponding to the devices. Accordingly, how the ISRs are distributed among the processors impacts the overall system performance.
Typically, there are more I/O devices than processors in an MP system. As a consequence, the I/O interrupts (and the ISRs associated therewith) need to be assigned in some manner to various processors in order that they get serviced. Currently, only static assignment methods are available where the assignment is made at boot time and is not altered thereafter in any significant fashion. However, a static assignment method cannot handle the addition and deletion of I/O devices and their corresponding ISRs in an optimal manner. Moreover, even if the initial assignment of interrupts is done in accordance with a technique that guarantees optimal performance, the overall performance will eventually degrade for a number of reasons. First, each type of device has a different interrupt frequency profile which cannot be ascertained beforehand but is necessary in order to optimize the performance. A gigabit Ethernet card, for example, is likely to generate far more interrupts per second than a serial port. Even devices that are very similar, from the OS kernel's perspective, can have different performance requirements. For instance, devices such as Small Computer System Interface (SCSI) cards can have different data rates. Accordingly, the kernel, and even the device drivers, cannot have enough information at boot time about how much traffic a device will generate in order to optimally assign the interrupts to processors.
In addition, a static assignment technique will become sub-optimal because the interrupt loads will vary over time. An interrupt distribution that is optimal for network traffic may be very poor for doing media backup operations, or when the system switches modes from transaction processing to batch processing. Moreover, the hardware and system topology itself can also change. With various high availability (HA) features and hot-pluggable processors and I/O devices being required of today's MP systems, the interrupt profiles will constantly change.