The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventor, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Large enterprises such as Google and Facebook build and maintain large datacenters known as Warehouse Scale Computers (WSCs) dedicated to hosting popular user-facing web services. These datacenters are expensive and resource-intensive. The scale is so large that these datacenters now require dedicated power plants for energy.
Maximizing the efficiency of computer resources in modern WSCs is a challenge rooted in finding ways to consistently maximize server utilization to minimize cost. One strategy for maximizing server utilization has been to co-locate multiple applications on a single server. However, a significant challenge that emerges from the unpredictable dynamism in WSCs is the threat that such solutions will violate quality of service (QoS) needs for user-facing applications, which are known to be latency sensitive. Sources of this dynamism include (1) fluctuating user demand (load) for user-facing applications, (2) highly variable co-locations between user-facing and batch applications on a given machine, and (3) constant turnaround on each server; when an application completes, new applications are mapped to the server.
Despite this dynamism, a capability missing in the WSC system software stack is the ability to dynamically transform and re-transform executing application code. That void limits the design space when designing solutions to deal with the dynamism found in WSCs and leads to missed optimization opportunities. An example of such a missed optimization is the ability to apply software non-temporal memory access hints to an application code to reduce its cache allocation and protect the QoS of its user-facing latency-sensitive co-runners. Modern ISAs, such as x86 and ARMv8, include prefetch instructions that hint to the processor that a subsequent memory access should not be cached. This instruction provides a mechanism that may cause an application to occupy more or less shared cache, and thus can enable higher throughput co-locations while protecting the QoS of high priority co-runners. However, it is difficult to leverage these hints effectively without a mechanism to dynamically add and remove them in response to changing conditions on the server.
‘Napping’ mechanisms, used reduce pressure on shared resources, have also motivated the need for a mechanism to dynamically add and remove instructions. ReQoS, for example, is a static compiler-enabled dynamic approach that throttles low-priority applications to allow them to be safely co-located with high-priority co-runners, guaranteeing the QoS of the high-priority co-runners and improving server utilization. However, due to the inability to transform application code online, these approaches are limited to using the heavy handed approach of putting the batch application to sleep, i.e., napping, to reduce pressure on shared resources.
In short, while the advantages of a mechanism for online code transformation are apparent, designing such a mechanism that is deployable in production environments has proved challenging. This has sorely limited adoption of dynamic compilation, particularly in production and commercial domains. Several challenges have prevented the realization of deployable dynamic compilation:
Overhead—It has been reported that companies such as Google tolerate no more than 1% to 2% degradation in performance to support dynamic monitoring approaches in production. The high overhead that is common in traditional dynamic compilation frameworks has served as a barrier to adoption in these performance-critical datacenter environments.Generality and Low Complexity—To avoid hardware lockin and overly complex software maintenance, a deployable dynamic compilation system should impose little or no burden on application developers and should require no specialized hardware support.Transformation Power—Traditional dynamic optimizers raise native machine code to an intermediate representation before applying transformations. This approach limits the power of the transformations due to loss of source level information. Having the ability to apply transformations online that are as powerful as static compilation significantly impacts the flexibility of the dynamic compiler.Continuous Extropsection—In a highly dynamic environment where multiple applications co-run, specializing code to runtime conditions should be done both introspectively, based on a host program's behavior, and extrospectively, based on external applications that are co-located on the same machine. To accomplish this, a runtime code transformation system must be aware of changing conditions for both itself and its neighbors, applying or undoing transformations accordingly.