A portion of the disclosure of this patent document may contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice shall apply to this document: Copyright(copyright) 2000, Microsoft, Inc.
The invention relates generally to data optimization techniques. More particularly, the invention relates to the organization of data structures in cache memory.
Application performance, e.g., the speed at which a software application runs, depends on several factors. One of these factors is the speed with which data is transferred into and out of the system""s memory. To improve data transfer speeds, many applications use data caching techniques to provide fast access to frequently used data. Cache memory includes fast static random access memory (SRAM) devices that move data to and from the CPU more rapidly than the main memory. Using cache memory reduces the amount of time the CPU is idle while it waits for data.
Because memory latency continues to grow relative to processor speed, the cache performance of programs has become an important consideration for system designers, compiler designers, and application programmers alike. Cache memory is organized into units known as lines. A line is the unit of memory that is moved into or out of the cache as the result of a memory access. Typically, lines are 32 or 64 bytes, so several data items can be placed on the same line. One way to reduce cache misses and thereby improve cache performance is to place data items that are often accessed together on the same line. If one such item is accessed shortly after another and neither is in the cache, then two cache misses can be reduced to one if the data are relocated to the same line. In the C and C++ programming languages, compilers layout the members of structures in the order specified in the declaration, respecting the alignment constraints of the members. If a structure spas more than one line, performance may be improved by reordering the members of the structure so that those members that are accessed closely in time are located on the same line.
There have been some approaches to reordering the members of structures. Some such approaches use an analysis technique known as a field affinity graph to describe how often two members or fields of a structure are accessed closely in time. Certain other approaches use an analysis technique known as a temporal relationship graph, which has a similar objective to a field affinity graph. In both of these approaches, the nodes of the graph are weighted with nonnegative numbers, where a large weight indicates that the members at the ends of the edge are often accessed closely in time. As a result, an ordering of the members of the structure corresponds to a clustering of the nodes of the graph where the members in each cluster are placed on the same line. The clusters are constrained so that the members of each cluster can actually fit on the same line. A good ordering, then, corresponds to a clustering where the sums of the weights of edges within clusters is maximized or, equivalently, the sums of the weights of edges between clusters is minimized. The field affinity graph and the temporal relationship graph differ in how the weights are determined. Further, the clustering algorithms used in these methods differ. In particular, those approaches that use the field affinity graph tend to use a greedy bottom-up approach to clustering, while those that use the temporal relationship graph tend to use a top-down global optimization approach. While both types of approaches yield good results, neither optimally solves the problem of reordering the members of structures to improve cache performance.
To improve the cache performance of programs, an analytical model known as a member transition graph is used to model access behavior to structures to determine whether and how members of the structures should be reordered. The member transition graph models the behavior of the program""s accesses to members of structures and the way in which the order of these members affects the number of cache misses. Trace data is used to calculate transition probabilities and cache line survival probabilities for each pair of members in the member transition graph. The member transition graph is then defined as a Markov-type model using these calculated probabilities. In addition, the member transition graph is used to simulate the effects of hypothetical reorderings of structure members on cache performance to determine the potential benefit of such reorderings. In one particular implementation, the model is used to define the cache miss rate for a hypothetical reordering by solving a system of linear equations. By finding an ordering that minimizes the cache miss rate, cache performance can be optimized.
According to one particular implementation of the present invention, cache access behavior of an application is characterized by collecting memory access information relating to the memory locations accessed within a cache memory while the application is being executed. The memory access information also relates to data objects that correspond to the accessed memory locations. Transition probabilities and survival probabilities are then determined based on the collected memory access information.
Another implementation is directed to a method for selecting an ordering of members of a data structure residing on cache lines in a cache. The method includes collecting trace data while an application is being executed. This trace data includes addresses corresponding to locations accessed within the cache during execution of the application. The collected addresses are associated with the members of the data structure. Transition and cache line survival probabilities are determined at least in part as a function of the plurality of collected addresses. A member transition model is then constructed as a function of the transition probabilities and the cache line survival probabilities. The member transition model is made of nodes connected by edges. Each node represents a distinct member of the data structure. The member transition model is used to select an ordering of the members from a plurality of possible orderings of the members.
Still other implementations include computer-readable media and apparatuses for performing these methods. The above summary of the present invention is not intended to describe every implementation of the present invention. The figures and the detailed description that follow more particularly exemplify these implementations.