As the term is used in this document, a “virtual machine” is a set of computer code that emulates a computer system in a manner that: (i) is based on computer architectures of a computer (for example, a physical computer including substantial hardware), and (ii) provides the functionality of a computer (for example, a physical computer including substantial hardware). Implementations of virtual machines (sometimes herein referred to as “instances” or “instantiations”) may involve specialized hardware, software or a combination of hardware and software.
Containers are similar to virtual machines but, being simply a process, containers are much quicker to start up and initialize. In some cases, use of containers may require extensive initialization. For example, creating a set of containers to work together requires configuration, setup, and validation. A pooled item would be comprised of a set of cooperating containers. In some situations, significant setup time is required, such as when a private virtual network needs to be set up in conjunction with the containers. This would mean that network resources would need to be a part of the pooling trade-off, since limited resources are involved. Thus, pooling may involve pooling a cooperative set of virtual machines, a cooperative set of container instances, or any of various combinations thereof.
Some virtual machines (VMs) include the following: (i) system virtual machines (also termed full virtualization VMs); (ii) process virtual machines designed to execute computer programs in a platform-independent environment; and (iii) VMs designed to also emulate different architectures and allow execution of software applications and operating systems written for another CPU or architecture. VMs are sometimes pre-allocated into instance pools. What this means is that images for an identifiable set (called a “pool”) of virtual machines are instantiated and initialized. Then, one or more virtual machine images is started in operation and allocated to be assigned to perform computational workloads, from a single source, acting as a unit.
In some virtual machines, such as Java™ virtual machines (JVMs), one or more methods are compiled at run time by a Just-In-Time (JIT) compiler which prepares compiled bodies. The compiled bodies are stored in a repository which may be referred to as a code cache. The code cache is allowed to expand as needed, but for practical purposes, JVMs impose limits on such growth. In some cases, this code cache limit is dictated by hard constraints related to the availability of physical memory. This limit may come into play in embedded environments where memory resources are scarce. In other cases, such as in cloud computing environments, the limit is derived from a desired design objective to have a high density of applications running concurrently on the same machine.
In yet other cases, the code cache limit may be imposed by pecuniary decisions. For instance, in a platform-as-a-service (PaaS) environment, customers may pay for the memory that they use. Moreover, some Java™ applications are so large that a set of methods that the JVM needs to compile will exceed the code cache limit. When the code cache limit is reached, the JVM stops compiling and the methods that would otherwise be compiled are forced to execute interpreted. In turn, this can have a significant negative impact on performance because executing a compiled method is typically at least ten times faster than executing an interpreted method.
One potential solution to the code cache limit problem is code pitching, where the entire code cache is flushed once the code cache limit has been reached. However, in the aftermath of the code cache flush, performance is likely to suffer a great deal because many methods that are still in use will need to be recompiled. It is possible to avoid this drop in performance by evicting code fragments from the code cache in a First-In, First-Out (FIFO) manner. The code cache is conceptualized as a big circular buffer, and eviction takes place on demand when a new method needs to be stored in the code cache. However, this approach does not perform an intelligent selection of which methods are retained and which methods are selected for eviction.
One possible idea is to instrument native code to keep statistics on method invocations to determine which methods are invoked frequently, and which methods are invoked less frequently. Although this idea sounds appealing, counting the invocations requires a set of counting instructions that increases the length of the corresponding code path, thereby adversely impacting performance. In order to ameliorate this performance impact, after the counting instructions are executed, these instructions could be patched into a No Operation (NOP) instruction. Basically, the NOP instruction is an assembly language instruction, programming language statement, or computer protocol command that does nothing Replacing the counting instructions with NOP instructions is a palliative measure for at least two reasons. First, NOP instructions still increase path length, albeit the NOP instructions are relatively cheap to execute. Second, patching at run time needs to be performed atomically, and therefore it is necessary to provide proper instruction alignment, with the result that even larger NOP sequences may need to be employed for padding.
Another possible approach is to use a sampling technique to identify frequently executed methods. However, this approach is too coarse to be useful in practice. For applications that have relatively flat execution profiles where there is no clear hot-spot involving two or three methods, this approach does not work. The primary problem is that an adequate number of samples from many relatively important methods will not be obtained in a timely manner.
Yet another approach logically splits the code cache into three regions: a nursery region, a probation cache, and a persistent cache. The nursery region is managed as a FIFO buffer. Newly-compiled code enters the nursery region. When the nursery region becomes full, the oldest methods are pushed into the probation cache. The main role of the probation cache is to detect infrequently executed code. If space needs to be created in the probation cache, any frequently executed methods that are currently residing in the probation cache are evicted into the persistent cache. At this point, any infrequently executed methods are subject to being permanently discarded from the persistent cache.
This three-region approach has some shortcomings. First, the detection of the infrequently executed methods requires some form of executing or accessing a counting mechanism, thereby adding computational overhead. Next, moving the compiled code from one region to another requires that the code to be in a relocatable form. Code that is in a relocatable form is generally slower than code that is not configured for relocation. Moreover, the JVM needs to perform some bookkeeping to fix and update all direct calls whenever a compiled body is moved. Additionally, an appropriate sizing of the three regions is critical in order for this technique to function properly. However, such sizing may be dependent upon the specifics of a given application, and it may not be possible to select sizing parameters that function across a wide variety of applications. Thus, there exists a need to overcome at least one of the preceding deficiencies and limitations of the related art.