1. Field of the Invention
This invention relates generally to network computer systems and, more particularly, to efficient transfer of data from client computers to server computers.
2. Description of the Related Art
Computer systems can perform no faster than the speed with which they are supplied data. Most computer networks include many server computers and client computers that exchange data with each other across network links. The clients and servers may share memory and data storage. For example, many computer systems have a hierarchical memory store with primary storage that may include disk libraries and RAID (Redundant Arrays of Inexpensive Disks) implementations. Because disk libraries and RAID systems are slow compared to storage such as dynamic random access memory (DRAM), data that are believed likely to be used more often are temporarily placed into faster memory, called cache memory. Much of the opportunity for improving the performance of such computer systems lies with efficient control of the memory hierarchy.
Efficiency of a memory hierarchy, such as a cache, is characterized by the cache miss ratio, which is the percentage of time that a data reference is requested and not found in the cache. Generally, a larger cache reduces the cache miss ratio and improves system performance, because a cache miss necessitates retrieval of the data from slower and cheaper data store, such as disk memory, and therefore slows down system data operations as processes wait for the data. Cache is typically constructed with semiconductor memory and can be costly, so cache is usually limited in size and designed to a targeted miss ratio. The system management task is to determine the most efficient cache size for the available resources and the tolerable miss ratio. In addition, cache may be partitioned, and further data store tuning may be achieved by the allocation of cache partitions to workload streams. The system management task becomes a workload balancing problem that is addressed by cache sizing and cache allocation.
Given a fixed workload stream of data references, the cache miss ratio can be plotted as a function of cache size. The cache miss ratio curve thus plotted can indicate when increasing cache size reaches the point of diminishing returns, and therefore the cache miss ratio curve can be used to make decisions about cache size and cache allocation to workload. Experimentally determining the cache miss ratio curve would necessitate preparing a workload and operating a variety of cache sizes with the workload, all the while keeping statistics on the miss ratio being experienced. This can be very difficult and time consuming, because the range of cache sizes can be vast. For example, cache size can easily vary from 10 megabytes (MB) of data to 10 gigabytes (GB) of data.
It is possible to determine the cache miss ratio for a cache of a given size and given workload through an off-line simulation of cache operation. The simulation is achieved by using a predetermined workload to generate a system trace, which is a historical record of every data reference requested in a computer system. The system trace, for example, may be the result of a workload comprising all data pages (blocks of 4K bytes of data) referenced over a time period of interest. The actual system outcome of each data reference is not important for the simulation, as the size of the cache being simulated will determine whether there is a hit or miss. The simulation program receives the trace, processes each request in the trace, and determines whether each data reference in the trace would have resulted in a cache hit or miss, in accordance with the size of the simulated cache. The system traces used in such simulations can be industry-standard bench marks, such as the well-known trace benchmarks referred to as "TPC-C" and "TPC-D".
Thus, with the system trace and off-line simulation, a cache miss ratio curve can be derived without constructing and operating a range of cache sizes. Unfortunately, the system trace can be a rather large file and the simulation can take a long time to perform. To ensure valid simulation results, the trace is typically at least ten times the size of the largest cache being simulated. Therefore, techniques for simulating the operation of cache and deriving the cache miss ratio curve have been developed. One way of deriving the cache miss ratio is to simulate cache operations as a column of references, or stack. Such techniques are referred to as stack algorithms. One such cache analysis technique involves a cache organized as a Least Recently Used (LRU) cache, in which data references are kept in a stack, with new data references placed at the top of the stack and older data references pushed down the stack. Each data reference will be associated with a stack distance, which represents the distance from the stack top where a cache entry is located. For each data reference in a system trace, the stack distance determines whether there is a cache hit or a cache miss.
Any cache can be represented as a data stack, and a system trace can be applied to determine cache hits and misses for the particular size cache. For example, FIG. 1 is a stack representation of cache operation that shows multiple columns that correspond to a sequence of data references in a system trace for an LRU stack. Proceeding from left to right for an initially empty stack, the left-most column of FIG. 1 indicates that the first data reference is a request for datum "A", which can be, for example, a request for a particular 4K data block, or page. Because A is not initially in the stack, the request for A is a cache miss. Thus, A would be entered into the cache and therefore is placed at the top of the stack, at cache location "1". Moving to the next column 104, the next data reference is a request for "B", which also is not initially in the cache and results in a cache miss. Therefore, B would be entered into the cache, and so A is pushed down in the stack to stack location "2" and B is placed at the top. Similarly, the third column 106 of FIG. 1 indicates that the next data reference is to "C", which is another cache miss, so that C would be entered into the cache and therefore is placed at the top of the stack.
Those skilled in the art will recognize that the fourth column 108 of FIG. 1 indicates that the next data reference is to the B page, which already is in the cache. In this LRU stack, the "stack distance" for B upon receiving this data reference is said to be two, because in the previous column 106 B is in the second stack position, one down from the top. In an actual cache operation, the reference to data page B would result in the cache being searched for B to resolve whether B is in the cache or not. This is referred to as a reference interval for resolution of the B page reference. Simulating the cache as a stack, the reference to B results in the B page moving to the top of the stack, pushing down C at the current stack interval 108. It should be apparent that if a cache has at least two page memory locations, then B will be in the cache at 106 and the data reference to B at that time will result in a cache hit at 108. The next column 110 of FIG. 1 indicates a reference to "D", which is not in the cache. This results in a cache miss and placement of D at the top of the stack.
The sixth column 112 indicates the occurrence of another data reference to C, which is in the cache, and therefore C is moved to the top of the stack. In the previous column 110, C had a stack distance of three. Therefore, upon receiving the request for C at 110, a cache hit will occur in a cache being simulated if the cache has at least three page memory locations, and will result in a cache miss otherwise. Finally, the seventh column 114 of FIG. 1 indicates another reference to A, which goes to the top of the stack. In the prior column 112, the stack distance for A was four. Accordingly, it should be apparent that a cache size of four data pages or greater will result in a cache hit for the data reference to A at 112, and that otherwise there will be a cache miss.
Thus, to determine the cache miss ratio curve, it is necessary to determine the cache distance for a data reference and determine if there was a hit or miss, and it will be necessary to search through the entire stack to find the location of a reference in the stack. The amount of searching required by this aspect of the simulation can consume a great amount of system resources. Recently, efforts to improve the derivation of the cache miss ratio curve have involved stack analysis techniques.
For example, U.S. Pat. No. 5,627,994 describes a method for allocating request streams and memory resources to a cache architecture, and derives a stack distance distribution through stack analysis. Such techniques for stack analysis can be used to address the cache sizing problem and improve system performance.
Cache storage system vendors and memory management program vendors have provided a multitude of opportunities to tune cache parameters. An optimal technique would be to permit dynamic self-tuning of cache parameters, to automatically determine the effect of cache partitions and reference stream allocations. It should be apparent that deriving the cache miss ratio curve for a variety of operating parameters is necessary to make informed tuning of cache parameters, whether such tuning is performed manually or automatically. It also should be apparent that dynamic self-tuning techniques must derive the cache miss ratio curve very efficiently to avoid slowing down the system operation.
From the discussion above, it should be apparent that there is a need for a stack analysis technique that can efficiently derive the cache miss ratio for a broad range of cache sizes. The present invention fulfills this need.