Computer systems such as desktop computers, laptop computers, cell phones, smartphones, tablets, personal data assistants, wearables, or other computer based technology can include various devices or components such as multiple central processing units (CPUs), different types of storage, controllers, peripheral devices, or fixed-function logic blocks (configured on a single chip). These devices may utilize interrupt requests to notify an appropriate CPU that an event has occurred or that data is to be transferred between these devices. An interrupt request is a signal which prompts a CPU to stop executing the current instruction routine temporarily and switch to a necessary special routine. In other words, the interrupt request can be used to interrupt the operation of a CPU in order for the CPU to process the task associated with the interrupt request. It provides a tool for prioritizing tasks and for integrating the operation of different components (software or hardware) on a computer system.
In a simple example, in the context of desktop computer, such computer system was often designed to include expansion slots that allow for peripheral cards for various customized applications to be added to the computer system. The peripheral card may have its own on-board processor but it will also need to interact or rely on CPU or memory of the desktop computer. Interrupt requests are sometimes used to signal the CPU to address an event that occurred on the peripheral card.
The nature and volume of interrupt requests and their management has grown more complex over time due for example to increased complexity of computer systems. For example, as a consequence of the rapidly evolving computer industry, more CPUs and devices are added to individual computer systems. As a result, the management and operation of interrupt requests have also become more burdensome. Also, in real time system, since many of the interrupts have to be processed to meet strict real time requirements, it becomes critical to service a large number of interrupts efficiently to reduce interrupt latency and power consumption.
One conventional method of handling interrupt requests is monitoring the current load of interrupt and reassigning an interrupt request from one CPU to another to better distribute the total load among CPUs. In one technique, the system may consider and identify both the CPU with the largest load of interrupt requests and the CPU with the smallest load. The method then may try to move interrupt requests from the CPU with the largest load to the CPU having the smallest load in an attempt to establish a better balance in the computer system. However, this simplistic approach is unable to handle situations such as when an interrupt is taking up 100% of the CPU (e.g., a network controller interrupt) because moving that interrupt to another CPU does not change the fact that one CPU will be dominated by that interrupt. Also, this approach may only look at the highest and lowest load and only try to move one interrupt at a time between that pair of CPUs. In some situations, repeated reassignments results in the same interrupt request being moved back and forth between the same two CPUs, without any overall improvement in the performance of the computer system. In fact, the repeated reassignment of the same interrupt impacts the computer system's performance because the computer system is expending resources to move that interrupt back and forth repeatedly.
Although there are many methods that seek to improve the management and handling of interrupt requests, almost all of them still focus on distributing interrupts based on the interrupt loads on the CPUs. Very few methods have considered issues associated with interrupt migration.
The kernel of Linux performs certain interrupt migration. When a CPU is about to unplug, the system checks every single interrupt in the overall interrupt list and determines if each interrupt in the list is targeted to the CPU about to unplug. If the determination finds that some interrupts are targeted to the CPU about to unplug, the system uses an affinity mask to find another CPU to service those interrupts. If there is an online CPU determined to be suitable by the affinity mask, the system will migrate those interrupts to that CPU. Otherwise the algorithm will route those interrupts to the first online CPU.
This process, however, is inefficient because the algorithm would need to search through the entire interrupt list containing all the interrupts to find the interrupts associated with the unplugged CPU. The number of interrupts for the unplugged CPU is much smaller compared to the total number of interrupts in the interrupt list or from all the CPUs, and searching interrupts not targeted to the unplugged CPU is unnecessary. Therefore, such a process is a waste of the computer system's resource and delays interrupt migration. This problem is even more pronounced as the number of unplugged CPUs increases. For example, when a computer system has M number of CPUs and M−1 number of CPUs to be unplugged and the total number of interrupts in the list is N, the total migration time or latency is (M−1)×N. As M increases, so does the latency.
Accordingly, there is a need for improved methods and systems for migrating interrupts.