The present application relates generally to an improved data processing apparatus and method and more specifically to a closed-loop feedback mechanism for achieving optimum performance in a consolidated workload environment.
Modern computing devices are built upon multiple processing core architectures. Some computing devices provide multi-threaded cores, where a thread is a sequence of instructions that may be executed in parallel with other threads. With multi-threaded cores, there is a complexity in how the threads in the core are designed. In some architectures, such as the Power® architecture available for International Business Machines (IBM) Corporation of Armonk, N.Y., which uses Symmetric MultiThreaded (SMT) technology, such as SMT4 (4 simultaneously executing threads), SMT2 (2 simultaneously executing threads), or any number of SMT threads, i.e. SMTn, the threads in a core have equal capacity and capability if all the threads run concurrently. However, if only one thread is running, then that thread gets a boost in performance since the thread has more capacity, i.e. it can make use of more computing device resources, than it would have if all the SMT threads are running a core.
The complexity in how threads in a core are designed increases when physical cores are virtualized and shared among multiple virtual processors either in the whole core or through time sliced virtualization. In a virtualization environment, the multiple virtual processors can be spread across multiple virtual machines (VMs) or logical partitions (LPARs) with each LPAR mostly operating in isolation. It should be appreciated that a LPAR may comprise one or more VMs and each VM may comprise one or more virtual processors which operate using one or more physical processors and physical resources of the computing device.
Operating systems, such as the Advanced Interactive Executive (AIX) operating system available from IBM Corporation, running on the LPAR/VM adopts intelligent scheduling, leveraging the knowledge of the hardware capabilities. In such a case, knowing that SMT threads get different capacity/capabilities depending on the other threads' running state on a core, the AIX scheduler schedules the primary (first) thread of each virtual processor in a LPAR/VM to get the best performance when the workload does not have tasks to run on each of the hardware threads of all the virtual processors in an LPAR/VM.
In a virtualization environment, LPARs, VMs, or virtual processors may over provision capacity on a system, e.g., each LPAR/VM can ask for all the resources in the system if they are available, in terms of virtual processors. Therefore, in this over provision configuration, in modern architectures up to 10× virtual processors, for example, can be in operation and the capacity and resources of the physical cores are time sliced across these virtual processors (if over provisioned and all virtual processors are dispatched by guest operating systems in each LPAR). The AIX operating system, even though using only one thread in each virtual processor, will schedule work on all the virtual processors in each LPAR leading to a situation where physical cores are time sliced across a large number of virtual processors. That is, rather than dispatching a single virtual processor with multiple threads associated with the single virtual processor, or less than all the virtual processors in the LPAR with threads spread across the subset of virtual processors, the AIX operating system will schedule work on all the virtual processors in each LPAR and spreads the threads across all of the virtual processors with time slicing being performed with regard to the physical resources, e.g., physical cores, memory, caches, etc., shared by these virtual processors. Time slicing increases context switches between threads and results in thrashing of caches leading to relatively lower performance.