The invention relates generally to cache memory in computer systems, and more specifically to a cache management system providing pre-load and pre-own functionality to enhance cache efficiency in shared memory distributed cache multiprocessor computer systems.
Multiprocessor computer systems are commonly used in high-performance applications because they can offer higher performance than systems using a single processor. Utilizing multiple processors that are not individually capable of providing the same performance as a multiprocessor computer system allows division of computing tasks among the multiple processors, decreasing the amount of work required in a single processor to complete a given task. Also, more than one task can be performed at a single time where each task or thread is executing on a separate processor or group of processors, enabling multiprocessor systems to efficiently serve multiple functions at a time. Multiprocessor systems incorporate many methods of allocating processor resources to the various tasks or threads they execute, all of which are designed to take advantage of the capability of such systems to perform computations on more than one processor at a time.
Early multiprocessor systems were typically large mainframe or supercomputers that were comprised of several processors mounted in the same physical unit. More recently, multiprocessor systems have evolved to include arrays or networks of interconnected computers or workstations that divide large tasks among themselves in a way that is similar to the division of tasks in traditional multiprocessor systems, and can achieve similarly impressive results. A variety of multiprocessor system architectures have evolved to include various combinations of these attributes, such as a network of interconnected multiprocessor workstations that divide tasks both among the processors in each workstation and among interconnected workstations.
With multiple processors working on a task in any configuration, a mechanism must exist for processors to share access to data and to share the results of their computations. One solution is use of a centralized shared memory which comprises a single memory that any processor can access. Other systems have distributed or independent memory for each processor or group of processors, providing faster access to the memory that is local to each processor or group of processors than is typically possible in a centralized memory architecture. In such systems, processors can access memory local to other processors or groups of processors, but doing so takes somewhat longer than accessing local memory.
The memory, whether centralized or distributed, can be further shared or multiple-address-type memory. Shared address memory is memory that can be accessed by any processor, whether the memory is distributed or centralized, to facilitate communication of data with other processors. Multiple address memory has separate memory for each processor or group of processors, and does not allow other processors or groups of processors to access this memory directly. Therefore, multiple address systems must rely on messages to share data between processors.
Cache memory can be used in any of these memory configurations to provide faster access to data that the processors are likely to need, and to reduce requests for the same commonly used data to be transmitted over the system bus. Storing data in cache provides faster access to the data, as cache memory is typically a more expensive but substantially faster memory type than is used for general system memory. The cache associated with each processor or group of processors in a distributed shared memory system likely maintains local copies of data that resides in memory local to other processors, and so also reduces the need to retrieve such data over the system bus.
Information about each block of memory is usually stored in a directory, which indicates which caches have copies of the memory block, whether the data is valid or invalid (dirty), and other such data. The directory is used to ensure cache coherency, or to ensure that the system can determine whether the data in each cache is valid. The directory also tracks which caches hold data that is to be written back to memory, and facilitates granting exclusive write access to one processor to update the memory. After the memory is updated, all other cached copies of the memory are no longer current and are marked invalid.
In this type of cache system, it is not uncommon for one processor to request exclusive access to or write to a specific cache line, invalidating all other copies of that line in other caches. In systems with large caches, most cache lines are invalidated for such reasons rather than replaced due to age, making invalidation of cache lines a critical factor in cache performance. What is needed is a method to reduce the impact of cache line invalidation due to granting of exclusive write access to another processor or modification of the line by another processor.