The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Interrupts are generated by devices in a computer, other than by the main central processing unit (CPU), which are to be processed by the CPU. As used herein, the term “interrupt” means an asynchronous event that requests that the CPU stop normal processing activities to handle the asynchronous event. Defining an interrupt as an asynchronous event means that the interrupt occurs without regard to what is currently being processed by a CPU. In contrast, synchronous events, such as traps, occur synchronously with the processing of an application by the CPU, such as a trap that occurs when a program tries to divide by zero or when an application tries to access memory that does not exist.
In general, any device besides the CPU that is part of the computing system can generate an interrupt. For example, devices that generate interrupts include, but are not limited to, the following: disk drives, keyboards, cursor control devices such as mice and trackballs, printers, USB ports, and network controllers. When a computing system is first powered up, the operating system interrogates each device to discover what interrupts each device can generate. In general, each device can generate any number of different interrupts.
The interrupts described herein are generated by hardware devices, as compared to other types of interrupts that are software constructs and that act to interrupt a program or application. For example, software interrupts are generated by the CPU itself and are sent to that same CPU for processing. In contrast, the term “interrupt” as used herein refers to interrupts that are generated by devices other than the CPU, and thus excludes such software interrupts. Although both software interrupts and hardware interrupts are handled similarly once received by a CPU, the software interrupts do no go to another device, such as a central hub that distributes interrupts among CPUs, since the software interrupts are generated by the same CPU that processes the software interrupt.
When devices generate interrupts in a single CPU computing system, a central hub receives the interrupts and then sends the interrupts them to the CPU for processing. The central hub may be also referred to as an “interrupt concentrator.” Upon receiving an interrupt, the central hub pre-digests the interrupt to put the interrupt into a standard representation before sending the interrupt to the CPU for processing. Upon receipt of an interrupt by the CPU, the CPU stops the normal processing being performed for applications to process the interrupt, because interrupts are given higher priority for processing by CPUs as compared to applications.
In a multi-CPU computing system, the multiple CPU's may be included in the same computing device, such as a multiprocessor server, included in separate devices such that each device includes a single CPU, or a combination of single processor devices and multiprocessor devices. With a multi-CPU system, the central hub not only pre-digests the interrupts, but the central hub also determines to which CPU to send a particular interrupt for processing. With multiple CPUs, the central hub uses a mapping of the different interrupts to CPUs to determine to which CPU to send each interrupt from a particular source. Based on the interrogation of the system's devices at startup, the central hub knows all the possible interrupts that could be received from the system's different devices. Using that list of all possible interrupts, the central hub maps each interrupt from each source device to a CPU of the system following the interrogation of the devices to determine the interrupts that each device can generate. The assignment of interrupts to CPUs is described in more detail herein.
As a result of the approach used to assign interrupts to CPUs, not all interrupts for a particular device will be handled by the same CPU, nor will all interrupts of a particular type from multiple devices of the same type go to the same CPU. However, once a particular interrupt from a particular device is mapped to a CPU, all instances of the particular interrupt from the particular device are handled by the same CPU, unless and until the mapping of interrupts to CPUs is changed, which is discussed further herein.
For example, in a dual CPU system, with the two CPUs designated by the identifiers “cpu0” and “cpu1,” a disk drive generates interrupts “dd_irq0” and “dd_irq1,” while each of two network controllers, designated as network controllers A and B, generate interrupts “nc_irq0” and “nc_irq1.” The central hub uses the following mapping of the interrupts to the two CPUs to determine which CPU is to be sent which interrupt: interrupt “dd_irq0” to “cpu0,” interrupt “dd_irq1” to “cpu1,” interrupt “nc_irq0” from network controller A to “cpu0”, interrupt “nc_irq1” from network controller A to “cpu1,” and both interrupts “nc_irq0” and “nc_irq1” from network controller B to “cpu1.” As a result, “cpu0” processes one disk drive interrupt (e.g., interrupt “dd_irq0”) and one network controller interrupt from network controller A (e.g., interrupt “nc_irq0”), while “cpu1” processes all the other interrupts.
In a multiple CPU computing system, one approach for assigning interrupts to CPUs is to assign all interrupts to a single CPU. However, this approach may result in unacceptable performance if the CPU is overwhelmed by the interrupts or a high priority interrupt monopolizes the CPU at the expense of lower priority interrupts. Another approach is to use a round robin scheme to distribute the interrupts among the CPUs. For example, in a dual CPU system, after interrogating the devices to determine which interrupts can be generated, the central hub assigns the first interrupt in a list of the possible interrupts to “cpu0,” the second interrupt on the list to “cpu1,” the third interrupt to “cpu0,” the fourth interrupt to “cpu1,” and so on, alternating between the two CPUs. If more than two CPUs are included, the interrupts are assigned to the CPUs in order, and when the last CPU is reached, the central hub starts over with “cpu0.”
The round robin approach is better than assigning all of the interrupts to a single CPU, such as “cpu0.” However, because some interrupts are more processing intensive and take up a larger portion of the CPU's processing resources, some CPUs may spend very little time processing interrupts, while other CPUs may at times be processing only interrupts without any CPU time being made available to applications.
One technique for minimizing the impact of processing interrupts on the processing of applications is to designate some CPUs as “eligible” for handling interrupts while other CPUs are designated as “ineligible” for handling interrupts. Then the ineligible CPUs can be the preferred choices for running applications, since those CPUs would not be adversely affected by the handling of interrupts that are concentrated among the interrupt eligible CPUs.
One problem with this approach is that some interrupt eligible CPUs will have much higher interrupt loads than others, which can adversely impact the processing of applications or even the other interrupts assigned to the high interrupt load CPUs. Another problem is that users may not know or be able control which CPUs run which applications, so some applications may still be processed by interrupt eligible CPUs. In some cases, an application be handled by a CPU with a large interrupt load, thereby adversely impacting application performance.
While most interrupts are not very processing intensive, some specific types of interrupts can potentially require a significant amount of a CPU's processing resources. For example, network controller interrupts, especially for some modern high capacity networks such as 10 Gigabit networks that receive a large amount of packet traffic, potentially can require a very significant amount of a CPU's processing resources. At times, the network traffic can be sufficiently high, either from legitimate uses or from a malicious attack on the network, that the CPU handling a particular interrupt for that network controller can be spending 100% of the CPU's processing time handling the particular network controller interrupt from that network controller. Such a CPU can be described as having an interrupt load of 100% because all of the CPU's processing resources are dedicated to processing the network controller interrupt from that high traffic network controller.
If the interrupts are assigned to the eligible CPUs in a round robin approach, any applications that are being executed on a 100% network controller interrupt loaded CPU will not be able to be processed at all by the CPU until the traffic on the network controller goes down sufficiently so that the CPU no longer has a 100% interrupt load. Even if the interrupt load is less than 100%, the amount of the CPU's processing resources that are available for use by the applications may result in unacceptable performance of the applications.
The problem of a CPU being completely loaded and overwhelmed by interrupt processing can be particularly troublesome when interrupts are prioritized. For example, with the Solaris operating system from Sun Microsystems, Inc., network controller interrupts are typically given a higher priority than other device interrupts, such as from disk drives. As a specific example, in Solaris 10, a priority interrupt level (PIL) is associated with each interrupt, such as a PIL of 6 for network controller interrupts and a PIL of 4 for disk drive interrupts.
If a CPU is assigned to handle both a disk drive interrupt and a network controller interrupt, there can be some time periods during which the network controller interrupt is taking up all the processing resources of the CPU. When this occurs, the CPU never processes the interrupt from the disk drive, such as during time periods of heavy network traffic. This can be a very significant problem in a computing system that has a single file system that is made up of hundreds of individual disk drives. Because the CPU sees the file system as a single device, once a single disk drive in the file system generates an interrupt for the CPU being dominated by the network controller interrupt, all the disk drives are essentially prevented from operating because the CPU never is able to process that first interrupt from the single disk drive, thereby preventing any other interrupts from any of the other disk drives from being processed as well.
One improvement on the round robin approach is to weight interrupts, so that the number of other interrupts that are assigned to the same CPU as a processing intensive interrupt are minimized. For example, in Solaris 10, network controller interrupts are given a much larger weight than other interrupts, so that once a network controller interrupt is assigned to a particular CPU, many more non-network controller interrupts would be assigned to the other interrupt eligible CPUs before another interrupt is assigned to the same CPU as the network controller interrupt. By using sufficiently large weights for such resource intensive interrupts, some CPUs can effectively be assigned only a single resource intensive interrupt.
However, one problem with the weighting approach is that while some interrupts have the potential to at times command 100% of the CPU's processing time, at other times, there may be little or no interrupt load from that interrupt on the CPU, depending the network traffic conditions at a particular time. Thus, the round robin and weighting approaches can result in some CPUs consistently having much higher interrupt loads while other CPUs consistently have much lower CPU loads, resulting in an unbalanced situation. In particular, with either the round robin or weighting approaches of assigning interrupts, there will typically be times when the system is unbalanced because some CPUs have little or no interrupt load, whereas at other times, some CPU's have interrupt loads at or near 100%.
The round robin and weighting approaches can be described as static interrupt assignment approaches because the interrupts remain assigned to the same CPUs, unless a special event occurs that triggers a reassignment of all of the interrupts. For example, the static interrupt assignment approaches typically only reassign interrupts when CPUs are added or removed from the computing system, provided that the system is capable of handling such additions and deletions of CPUs without being restarted. As another example, the static interrupt assignment approaches may reassign all the interrupts when changes are made regarding which CPUs are either eligible or ineligible to process interrupts. In other systems, changes to the available CPUs or the interrupt eligible CPUs may require a restart of the system so that the interrupts can be reassigned.
In contrast to the static interrupt assignment approaches described above, a dynamic interrupt assignment approach can be used that takes into account the actual interrupt loads on the CPUs and then reassigns an interrupt from one CPU to another to better distribute the total interrupt load for the system among the interrupt eligible CPUs. For example, in Solaris 8 for x86 processors, an interrupt assignment approach is used that considers all the CPUs processing interrupts and identifies both the CPU with the biggest interrupt load and the CPU with the smallest interrupt load. The approach is then to try to move one interrupt from the high load CPU to the low load CPU in an attempt to establish a better balance of the interrupts for the system. But this simplistic approach is still unable to handle pathological situations, such as with a network controller interrupt that is taking up 100% of the CPU, because moving that interrupt to another CPU does not change the fact that one CPU will be dominated by that network controller interrupt. Also, this dynamic approach only looks at the highest and lowest loaded CPUs and only tries to move one interrupt at a time between that pair of CPUs. In some situations, repeated reassignments results in the same interrupt being moved back and forth between the same two CPUs, without any overall improvement in the system's performance. In fact, the repeated reassignment of the same interrupt impacts the system's performance because the system is expending resources to move that interrupt back and forth repeatedly.
Note that when describing the moving of an interrupt between CPUs, a particular instance of an interrupt that is being processed by a CPU remains on that CPU until processing of that instance of the interrupt is complete. However, when the interrupt is moved from one CPU to another CPU, the mapping of interrupts to CPUs used by the central hub is updated so that when another instance of the same interrupt is later received by the central hub, the new instance of the interrupt is sent to the newly assigned CPU instead of the originally assigned CPU.
Another dynamic interrupt assignment approach is incorporated into the “irqbalance” daemon that is part of Linux. A daemon is an application that runs in the background and is generally not visible to the user because the daemon does not generate any windows or other effects that the user normally sees via the user interface. With “irqbalance,” a simple analysis of the interrupt loads on the eligible CPUs is made every ten seconds, and based on that interrupt load information, interrupts are reassigned among the eligible CPUs. This approach is better than the simple dynamic approach described above because multiple CPUs are considered and multiple interrupts can be moved. However, “irqbalance” has several drawbacks.
One problem with the “irqbalance” approach is that there is a performance impact from executing “irqbalance” every 10 seconds. Because the performance measurement and reassignment activities require some processing time on the CPU on which the “irqbalance” daemon is executing, there are less processing resources available for executing other applications on that CPU.
Another problem with “irqbalance” is that by frequently moving interrupts between CPUs, there is a performance impact based on the use of “warm” caches. A “warm” cache is a cache that already includes some or all of the information needed to handle a particular interrupt. Each time an interrupt is moved to another CPU, the new CPU has a “cold” cache because that interrupt was not previously handled on that CPU. When the first instance of that interrupt is processed by the new CPU, the information needed by the CPU to process the interrupt gets loaded into the CPU's cache since that information was not previously included in the cache. While subsequent instances of that particular interrupt on the CPU may be able to use a “warm” cache, the cache may only be warm for the 10 second interval before interrupt is yet again be moved to another CPU.
Yet another problem with “irqbalance” is that a 10 second sleep interval is used, but otherwise “irqbalance” does not keep track of the time while executing. Therefore, if during execution, “irqbalance” is interrupted for a period of time, say half of a second because the CPU is processing an interrupt, the interrupt load information may be inconsistent because the load information is taking over a relatively long time period that includes the half-second delay in collecting the interrupt load information. In particular, the interrupt load of a particular CPU may be very different after that half-second delay, due to the normal variation in interrupt loads. This can result in the moving of interrupts that otherwise would not be moved if the load information were collected over a shorter time period so that the interrupt load information was more representative of the different CPUs' interrupt loads at the same point in time.
Finally, another problem is that “irqbalance” is designed for typical implementations of Linux on computing systems with a small number of CPUs, usually only two or four CPUs. As a result, there is no provision in “irqbalance” for dynamic provisioning of CPUs, such as the addition or removal of a CPU from the system without restarting the system. Also, “irqbalance” is unable to address the changing of the designations for CPUs as to whether a particular CPU is eligible or not eligible to process interrupts. In a computing system with a small number of CPUs, such changes are likely to be infrequent, but in larger computing system with dozens or even hundreds of CPUs, the ability to handle the addition and removal of CPUs without having to restart the entire system can be very important. Therefore, “irqbalance” is unable to properly accommodate CPU provisioning in computer systems with more than a handful of CPUs.
In summary, while the dynamic approaches for assigning interrupts are generally better than static interrupt assignment approaches, the dynamic approaches described above still have significant drawbacks. As a result, it is desirable to provide improved techniques for distributing multiple interrupts among multiple CPUs. It is also desirable to have improved techniques for handling situations in which a single interrupt can dominate a particular CPU.