1. Field
The present disclosure relates to computer systems and methods. More particularly, the disclosure the pertains to processor scheduling mechanisms.
2. Description of the Prior Art
By way of background, Scheduling clock interrupts have long been used in operating systems to implement time slicing, preventing a CPU-bound process from starving other processes. However, scheduling-clock interrupts are not free, particularly when running real-time applications or high-performance-computing (HPC) applications. For these types of applications, OS jitter resulting from the scheduling-clock interrupts can greatly degrade performance, resulting in considerable efforts to reduce OS jitter. The Linux® community has been working to address this problem by removing scheduling-clock interrupts.
This work checks state when a given CPU exits from the kernel to userspace execution. If there is only one runnable task on this CPU, the kernel turns off the scheduling-clock tick and also informs any kernel subsystems that need to know about this, including RCU (Read-Copy Update). RCU handles a CPU running in userspace without a scheduling-clock tick in the same way that it handles a CPU that is idle without a scheduling-clock tick. RCU continues to track the number of reasons that each CPU is non-idle, as discussed in Section 2.1 below, using an integer whose value is zero when the corresponding CPU is idle.
However, one challenge is the general-purpose nature of the Linux® kernel, which requires that timekeeping be maintained whenever at least one CPU is running either non-idle-loop kernel code or user-mode code. The variant of adaptive ticks in the Linux® kernel handles this need by keeping the scheduling-clock interrupt turned on for at least one CPU (designated the timekeeping CPU) at all times. This works well because this CPU can be designated a housekeeping CPU on which the OS-jitter-sensitive application never runs.
Unfortunately, this approach prevents all CPUs from going idle for extended periods, because one of the CPUs will continue to receive scheduling-clock interrupts. This situation needlessly wastes energy, so an improved approach would be quite useful.
There are a number of straightforward approaches, each with equally straightforward drawbacks:
1. Maintain a global count of the number of non-idle CPUs, shutting off the time-keeping CPU's scheduling-clock interrupt when all CPUs are idle. This works well for small systems, but results in scalability problems for large systems due to excessive memory contention on the variable containing the global count, especially for workloads that cause large numbers of CPUs to enter and exit idle extremely frequently. This approach also requires careful coordination with the CPU hotplug system.
2. Run a small computational kernel on the non-housekeeping CPUs. This is the approach used by many commercial HPC systems, including those from IBM and Cray, but it has the drawback of severely constraining the application's design. These constraints are due to the need to communicate to special I/O nodes to handle normal system calls, and the inability of computational kernels to support more than one thread per CPU.
3. Within the Linux® kernel, take the non-housekeeping CPUs offline and run the application on these “offline” CPUs within the context of the Linux® kernel. This is a variant of the computational-kernel approach, and suffers all the drawbacks of that approach, but also requires the difficult task of debugging within the unforgiving Linux® kernel software environment. This approach also voids the warranty provided by most organizations providing commercial support for the Linux® kernel.
What is needed is an approach that allows the scheduling-clock interrupt to be shut down when a given CPU is executing user-mode code, but which also allows all CPUs to simultaneously dispense with scheduling-clock interrupts when the system is fully idle, that performs and scales well (even on systems with thousands of CPUs), and does not entail the application restrictions required by the various computational-kernel approaches.