In many multiprocessor systems, all of the processors access a common memory, referred to as main memory. Typically, main memory is not capable of supplying data and instructions to multiple processors at adequate speeds. To compensate for the speed deficiencies of main memory, caches are usually incorporated. Caches are small high-speed memories located between main memory and the processor, and that are updated to contain recently accessed contents of the main memory. A cached copy of the contents of main memory can be accessed at a much higher speed than from main memory.
In multiprocessor systems, caches are usually attached to each processor. Thus, multiple copies of a particular data item may reside within multiple caches at any given time. In this situation, a memory coherency protocol must be used to ensure that each processing device such as an Instruction Processor (IP) always operates from the same, most recent, copy of the data. This type of protocol allows data to be shared among many devices for read-only purposes. Before a device can modify the data, it must gain exclusive access to the data. In this case, all other cached copies of the data are marked as unusable, or “invalidated”. After a device gains exclusive access to data, the device may, but is not required to, modify the data. When a device relinquishes exclusive access rights, any updated copy of the data must be stored in the main memory, or may be provided to another cache within the system.
One cache coherency mechanism of the type described above is the MESI (Modified Exclusive Shared Invalid) protocol. Single bus multiprocessor systems using a MESI protocol allow the devices on the bus the ability to snoop the bus to observe all data requests on the bus. In a MESI protocol, a cache will snoop the request bus to ascertain the status of a data unit that it is requesting. If it is determined that no other device, with the exception of main memory, is currently holding a copy of that data unit, then the cache can obtain that data unit in the exclusive state, and thus write to that data unit without having to go back to the bus to acquire ownership or exclusive rights.
In a multiprocessor system comprising a hierarchy of buses, the ability to snoop the bus to determine the status of a data unit on all devices in the system no longer exists. When a processor requests read access to a data unit, the system must now decide whether to give the data to the processor in a shared state or in an exclusive state. In conventional systems, the general technique used is to provide the data in the shared state. If the processor later desires write access to the data, the processor must make another request to obtain the data in the exclusive state. This additional request consumes the band pass of the system interfaces as well as the main memory, and forces the processor to wait for the return of data before the write operation can complete.
Some prior art systems seek to minimize the above-described problems by predetermining that certain access rights will always be granted based on the type of data being requested. For example, if a request is made to access an area of memory storing instructions, the main memory will return the data in a shared state, since it is unlikely this data will be modified during runtime. Similarly, if the data was written to memory by an input/output operation as would occur during system initialization, and has not since been updated by an instruction processor, the data is likely to be read-only data. It is therefore returned to the requester in a shared state. In contrast, if the memory data has been updated by an instruction processor after initialization was completed, the data is considered to be read/write data, and is therefore returned to requester in the exclusive state.
Although the foregoing approach provides performance gains, it is relatively simplistic, and thereby causes some inefficiencies. For example, this mechanism does not take into account the use of static variables. Static variables are generally stored within main memory by an instruction processor some time after memory initialization is completed. Thereafter, these variables are referenced for read-only purposes. This is common, for instance, during the initialization of an operating system when one or more instruction processors store configuration variables to one or more tables within the main memory. These variables are thereafter referenced solely as read-only data. If prior art mechanisms are applied to static variables, exclusive access will be granted when requests are made for these variables. This is because these variables were updated after the original initialization of main memory. As a result, only a single processor will be able to access these variables at once. In the case of frequently referenced system data, this is a highly inefficient situation that will impose a high degree of latency on the processors within the system.
Commonly assigned U.S. Pat. No. 6,052,760 to Bauman et al. provides another mechanism for predetermining the type of access rights that should be granted for particular types of data. Although the '760 patent is not directed towards solving the problems related to static variables, it does address the efficient manage of “software locks”. A software lock is a memory location, or “cell”, that is being used to control access to an associated data structure. Before a software process may access the data structure, it must first acquire the lock. This is accomplished by writing the lock cell to a predetermined state. A lock cell can only be acquired if it has not already been acquired by another software process, as is indicated by the state of the lock cell. Therefore, software locks are generally obtained using an indivisible test-and-set operation whereby the lock cell is tested, and if available, is set in a single, unified sequence of events.
The system described in the '760 patent is designed to identify potential software locks. This is accomplished by detecting memory locations that are read, and then written, during successive memory operations, as would be the case during test-and-set operations. After a memory location has been identified as a potential memory lock, any subsequent request for that location will be satisfied by returning the contents of the location to the requester in the exclusive state. Exclusive access is provided so that a requester may immediately obtain the lock, if it is available, without making an additional request to obtain exclusive access rights for the lock cell.
Several observations may be made regarding the system of the '760 patent. First, although this system provides for the efficient acquisition of software locks, it does not necessarily make the subsequent use of the lock cell more efficient. This can be appreciated by considering the typical use of a lock cell. After a first processor acquires the lock using a write operation, one or more additional processors may continue to test the value of the lock cell to determine when the lock is released by the first processor such that it may be acquired by one of the additional processors. Each time a processor performs a test operation, exclusive access to the lock cell is provided. Assuming at least two processors are testing the lock cell during the same time period, obtaining exclusive access rights to the lock cell likely involves invalidating an exclusive copy of a lock cell in another processor's cache, then obtaining a copy of the lock cell from main memory or from that other processor. Because testing of a lock cell is generally performed within tight software loops, the process of continually invalidating, then copying, the lock cell may be repeated thousands of times while waiting for the lock to be released. This memory thrashing places a significant amount of traffic on the memory interfaces. It also consumes a significant portion of the bandwidth of the main memory by repeatedly filling the memory input queues with requests for the lock cell. Finally, it increases the latency associated with releasing the lock, since the first processor will not obtain access to the lock cell until all preceding requests from the other processors have been satisfied.
What is needed, therefore, is a system and method for addressing the foregoing problems in a manner that optimizing the granting of access rights to shared memory data.