Operating system interference, caused primarily due to scheduling of daemon processes, and handling of asynchronous events such as interrupts, constitutes “noise” or “jitter” (henceforth referred to as OS Jitter). OS Jitter has debilitating effects on large scale high performance computing (HPC). Traditionally, HPC systems have avoided OS Jitter by making use of specialized lightweight operating systems on computer nodes. However, this approach is not very useful as most applications written for commercial operating systems are rendered to be incompatible. For compatibility reasons, lightweight versions of commodity operating systems such as Linux™ have been created which can be used on compute nodes of large scale HPC systems. The creation of lightweight version of commodity operating systems requires a detailed study identifying the sources of OS Jitter and a quantitative measurement of their impact on these operating systems be carried out. To date, these studies of OS Jitter have proved are insufficient, as they have concentrated either on measuring overall OS Jitter experienced by an application or on estimating the effect of OS Jitter on the scaling of parallel applications and have not studied the issues of determining the biggest contributors to OS Jitter.
Apart from the known adverse effects of operating system clock ticks or timer interrupts there is little data available about system daemons and interrupts that contribute to OS Jitter. Furthermore, tuning an ‘out of the box’ commodity operating system is only the first step towards mitigating the effects of OS Jitter. In the absence of any quantitative information about the OS Jitter caused by various system daemons and interrupts, system administrators have to resort to their established knowledge and other ad-hoc methods to tune a system for HPC applications. This process not only requires highly knowledgeable system administrators, but is also error prone given the fact that new versions of these commodity operating systems get released at fairly regular intervals and new sources of OS Jitter get introduced in these releases.
Identification of all possible sources of OS Jitter and measurement of their impact on an application requires a detailed trace of the OS activity. Existing general purpose OS profiling tools, such as OProfile or the Linux kernel scheduler stats provide only a coarse measure in terms of time spent in each kernel function or process and do not uniquely measure the OS Jitter perceived by an application due to each OS Jitter source. Another tool for tracing events in Linux is the Linux Trace Toolkit (LTT) which, however, cannot record all interrupts and processes in a given time period without modification to the LTT.
Benchmarks developed specifically for studying OS Jitter such as the selfish detour benchmark, which can be used to measure OS Jitter on a wide range of platforms to study the effect on parallel program performance. Such benchmarks rely on the technique of sampling the timestamp register at a relatively high rate in a loop based on the fixed work quantum principle. However, these benchmarks do not provide any information about what daemons and interrupts contribute to OS Jitter and by how much.
OS noise has been studied in prior art (“System Noise, OS Clock Ticks, and Fine-grained Parallel Applications”, D. Tsafrir, Y. Etsion, D. G. Feitelson, and S. Kirkpatrick, in Proceedings of ICS, 2005) and more specifically the impact of OS timer interrupts on parallel application performance. A methodology for determining the OS Jitter component was provided for by micro benchmarking the kernel through use of accurate timers. An in-kernel logging mechanism, called KLogger, was devised to trace fine-grain events. However, it could not identify all sources of OS Jitter and measure their impact or compare various configurations of a system to detect new sources of OS Jitter are introduced during software installation.
A need therefore exists for a tool that can identify the various sources of operating system jitter, measure their impact and provide a solution. A further need exists for a tool that can compare various configurations of a system to detect new sources of OS Jitter are introduced during software installation.