The present invention relates to computer systems, and more particularly, to multiprocessor computer systems in which a plurality of processors share the same memory.
One class of computer system has two or more main processor modules for executing software running on the system and a shared main memory that is used by all of the processors. The main memory is generally coupled to a bus through a main memory controller. Typically, each processor also has a cache memory, which stores recently used data values for quick access by the processor.
Ordinarily, a cache memory stores both the frequently used data and the addresses where these data items are stored in the main memory. When the processor seeks data from an address in memory, it requests that data from the cache memory using the address associated with the data. The cache memory checks to see whether it holds data associated with that address. If so, the cache memory returns the requested data directly to the processor. If the cache memory does not contain the desired information (i.e., a xe2x80x9ccache missxe2x80x9d occurs), the cache requests the data from main memory and stalls the processor while it is waiting for the data. Since cache memory is faster than main RAM memory, this strategy results in improved system performance.
Cache copies of data are kept coherent via coherent protocols. The memory system typically has a directory that contains one line for each memory line in the main memory. The directory stores information specifying the state of each memory line and the processors whose cache memories have copies of the line. If the line is dirty, only one processor has a valid copy of the line.
When a cache miss occurs, the processor suffering the miss requests the data from the main memory. The memory controller examines the directory to determine if the requested cache line is dirty or clean. If the cache line is clean, the main memory supplies the data directly to the processor. If the cache line is dirty, the main memory causes the cache having the current valid copy to supply the cache line to the requesting processor.
One class of data stored in the main memory may be used by more than one processor in the system. Special precautions must be taken to protect the integrity of this data during operations that modify that data. Simultaneous load and store operations from multiple processors can cause data-race conditions that destroy the coherency of the data. Hence, when one processor performs a xe2x80x9cread, modify, writexe2x80x9d to a shared variable, all other processors must be prevented from using the data during the time the first processor is performing the operation. This protection is typically provided by a locking mechanism. When a processor wishes to perform an operation that alters the shared data, the processor first requests a memory lock from the memory controller. Once the memory controller grants the lock, the processor proceeds with its operations. At the end of the operations, the processor sends a message to the central controller that unlocks the data. During the locked period, no other processor can gain access to the locked data.
The time delays inherent in this type of locking scheme can be very large. The time to obtain a lock is inherent in sending a message to the main memory controller on the bus and receiving a response. If the main memory is already locked, the process must be repeated until a lock is granted to the processor. The processor must then read the data that is to be modified. This read operation often generates a cache miss. Hence, the processor must stall while the data is obtained from main memory or some other processor in the system. If the data is in the main memory, the latency time for the read is about the same as that required to obtain the lock. If the data is stored in the cache of one of the other processors, the latency time is at least twice the lock latency time. This additional latency time substantially decreases the efficiency of the entire computer system, since it increases the time of the xe2x80x9cread, modify, writexe2x80x9d in the processor in question, as well as increasing the time during which other processors will be locked out of the main memory.
Broadly, it is the object of the present invention to provide an improved memory system.
It is a further object of the present invention to provide a memory system that has reduced latency times during lock operations.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.
The present invention is a method for operating a shared memory computer system to reduce the latency times associated with lock/unlock code sequences. The computer system includes a shared memory and a plurality of processors. When one of the processors wishes to modify a shared variable stored in the shared memory, the processor must first request and receive a lock from the shared memory. The lock prevents any other processor in the computer system from modifying data in the shared memory during the locked period. In the present invention, a list of variables in the shared memory that are shared by two or more of the processors is generated. When one of the processors is granted a lock, a prefetch instruction is executed for each variable in the list. Each prefetch instruction specifies the processor receiving the lock as the destination of the data specified in that prefetch instruction. The list may be generated by a compiler during the compilation of a program that is to run on one of the processors. Alternatively, the list can be generated while the program is running either with test data or during the normal execution of the program. The list generation and prefetch instruction executions may be carried out by modifying the program and/or shared memory controller code or via special purpose hardware that monitors the memory bus.