The present disclosure generally relates to a method of cache preloading on a partition or a context switch, and a system for implementing the same.
Virtualization has become a “magic bullet”, providing a means to increase utilization, improve security, lower costs, and reduce management overheads. In some scenarios, the number of virtual machines consolidated onto a single processor has grown even faster than the number of hardware threads provided by modern processor. Multiprogrammed virtualization allows multiple virtual machines to time-share a single processor core, however this fine-grain sharing comes at a cost. Each time a virtual machine gets scheduled by a hypervisor, it effectively begins with a “cold” cache, since any cache blocks it accessed in the past have likely been evicted by other virtual machines executing on the same processor.
Server consolidation, virtual desktop infrastructure (VDI) environments, and cloud computing trends dominate the landscape of new server purchases. The growth of these trends has led not only to a much wider adoption of hardware virtualization, but also to an increasing number of virtual instances, or partitions, being consolidated onto each physical system. For example, International Business Machines Corporation (IBM) has reported a case study of consolidating 3,900 servers onto only 30 mainframe systems and a number of virtualization software case studies have reported consolidation ratios from 4:1 to 15:1. As another example, B. Botelho, “Virtual machines per server, a viable metric for hardware selection?” available at http://itknowledgeexchange.techtarget.com/server-farm/virtual-machines-per-server-a-viable-metric-for-hardwareselection, has suggested that in virtual desktop infrastructure (VDI) environments a good rule of thumb is to combine six to eight virtual desktop instances per processor core. In the future, the number of partitions on each machine is expected to continue to increase.
Consolidating many partitions onto a single system generally requires some form of multiprogrammed virtualization in which multiple partitions time-share a single hardware thread. To meet quality of service (QoS) constraints and provide real-time interactive response times, the execution interval for each partition is kept relatively short. For instance, the PowerVM partition manager available at IBM allocates some portion of a 10 ms dispatch window to each active partition, such that a system with ten partitions might execute each partition for only 1 ms at a time within the 10 ms window. A number of virtualization software case studies provide good examples of why short response times are important in VDI environments as they implement VDI in hospitals, including in urgent care departments and other “mission critical” applications.
Multiprogrammed virtualization incurs overheads each time the hypervisor switches partitions, with much of this slowdown coming from the loss of microarchitectural state in the processor. While a partition is switched out, other partitions pollute the processor's caches, branch predictors, and translation lookaside buffers (TLBs). By the time the first partition is subsequently scheduled for its next execution interval, the intervening partitions might have evicted all of its state, resulting in an almost cold cache. While these effects could be amortized by executing each partition for longer periods of time, the need to maintain fast response times limits the applicability of this solution.