1. Field of the Invention
The present invention relates to a method for increasing efficiency in a multi-processor system and a multi-processor system with increased efficiency.
2. Description of Related Art
The processors in conventional multi-processor systems include inclusive caches, which means that the higher level caches store the same cache lines stored in the lower level caches as well as cache lines not stored in the lower level caches.
For instance, in a processor having a level one or L1 cache and an external level two or L2 cache, the L1 cache, by design, is disposed closer to the execution units of the processor and has a lower storage capacity than the L2 cache such that the L1 cache has a lower access time. The L2 cache, however, stores a larger number of cache lines and includes all the cache lines stored in the L1 cache. Because of this inclusivity, only the L2 cache needs to monitor commands on the system bus, which provides communication between the processors, and generate responses thereto. This monitoring of commands on the system bus is referred to as snooping.
Commands on the system bus are snooped as part of a memory coherency protocol. Typically, such protocols require the caches in each processor to associate a memory coherency image state with each cache line. The memory coherency image state of a cache line indicates the status of the cache line. Through snooping, the caches of a processor continually update the memory coherency image state for each cache line stored therein.
A cache snoops a command by determining whether the real address associated with the snooped command matches the real address of a cache line stored therein. If a match is found, the cache updates the memory coherency image state for the cache line in a well-known manner, and outputs a snoop response based on the updated memory coherency image state in a well-known manner. If no match is found, the cache outputs a snoop response indicating no match found.
Using inclusive caches, however, requires higher level caches (1) to track the cache lines stored in the lower level caches, and (2) to constantly update the cache lines stored therein based on changes in the cache lines stored by the lower level caches. By using non-inclusive caches, both the tracking and updating functions can be eliminated. Because the caches are non-inclusive, each cache must be considered when snooping a command. Outputting multiple responses to one snooped command from a single processor, however, increases the complexity of maintaining data integrity.
Whether inclusive or non-inclusive caches are used, conventional multi-processor systems suffer from various inefficiencies when processing commands and updating memory coherency image states. For the purpose of discussing these inefficiencies, the MESI memory coherency image state protocol will be used.
As is well-known, the MESI states are Modified, Exclusive, Shared, Invalid, shared Owner, fill Pending, and various error states. Modified means that cache line associated therewith includes modified data. Exclusive means the cache storing the cache line has exclusive ownership of the cache line. Shared means that the associated cache line is also stored in another cache. Shared Owner means that the associated cache line is stored in another cache, but that the cache storing the cache line has had the last access to the cache line. Fill Pending means that a command associated with the cache line has not received a system response thereto.
In response to instructions from the associated processor, caches often generate commands. These commands are placed on a system bus providing communication between the processors of the multi-processor system. The processors connected to the system bus snoop these commands, or more properly the cache or caches associated with the processors snoop these commands, and generate snoop responses thereto. The snoop responses are generated based on the MESI states of cache lines having the same real address as the real address associated with the snooped command or the lack of such cache lines. The possible snoop responses include, in order of priority: retry, indicating that the snooped command should be retried at a later point in time; the MESI state Modified; the MESI state Shared; and null, indicating no real address match.
The caches also update the MESI state of the cache lines stored therein based on the type of command snooped. Generally, commands are either exclusive or non-exclusive. Exclusive means that the processor issuing the command intends to store, flush or invalidate the cache line in some way, while non-exclusive means that this cache line could be shared.
An arbiter connected to the system bus collects the snoop response from each processor and generates a system response. Typically the system response is the highest priority snoop response among those output by the processors. The system response notifies the processors whether the command can not be completed at this time because of, for example, a collision with another command, i.e., the retry response, or whether the command can complete and the effect on other processors, i.e., any response other than retry referred to as not retry.
A collision occurs, for instance, when a first processor issues a first command associated with a cache line and before that first command completes, a second processor issues a second command associated with the same cache line. In this event, the first processor will maintain memory coherency by always outputting a retry snoop response to the second command. Accordingly, having to process the retry response and reissue the second command at a later time reduces the efficiency of the multi-processor system.
An unwanted number of retries and resulting decrease in efficiency also occurs when a processor issues a kill command. A kill command is one that requests each cache storing the associated cache line to invalidate that cache line regardless of the MESI state therefor. But, when the MESI state for the cache line in the cache of a processor is modified, that processor, to preserve the modified data, will output the cache line for storage in the main memory of the multi-processor system. This is referred to as performing a castback of the cache line. As a result of needing to castback the cache line, the processor will output a retry response to the kill command. Again, the reason for responding to the kill command in this fashion is memory coherency. The castback causes the modified data to be stored in a main memory of the system; thus, preserving the modifications.
One object of the present invention is to provide a method for maintaining multi-level cache coherency in a processor with non-inclusive caches.
Another object of the present invention is to provide a processor with non-inclusive caches which maintains coherency therebetween.
A further object of the present invention is to provide a method and higher level cache which prevent collisions between two cache queries, one of which is the result of a snooped command.
Also an object of the present invention is to provide a method of increasing efficiency in a multi-processor system and a multi-processor system having increased efficiency.
Another object of the present invention is to provide a method of increasing efficiency in a multi-processor system and a multi-processor system having increased efficiency which reduce the number of retry responses.
A further object of the present invention is to provide a method of increasing efficiency in a multi-processor system and a multi-processor system having increased efficiency which update memory coherency image states more efficiently.
A still further object of the present invention is to provide a method of increasing efficiency in a multi-processor system and a multi-processor system having increased efficiency which prevent unwanted and undesirable invalidation of a cache line throughout the multi-processor system.
These and other objectives are achieved by providing a processor which includes at least a lower and a higher level non-inclusive cache, and a system bus controller. The system bus controller snoops commands on the system bus, and supplies the snooped commands to each level of cache. Additionally, the system bus controller receives the snoop responses to the snooped command from each level of cache, and generates a combined response thereto.
When generating responses to the snooped command, each lower level cache supplies its response to the next higher level cache. Higher level caches generate their response to the snooped command based in part upon the response of the lower level caches. Also, high level caches determine whether or not the cache address, to which the real address of the snooped command maps, matches the cache address of at least one previous high level cache query. If a match is found by a high level cache, then the high level cache generates a retry response to the snooped command, which indicates that the snooped command should be resent at a later point in time, in order to prevent a collision between cache queries.
The objectives are further achieved by providing a multi-processor system including at least first and second processors, a system bus providing communication between the first and second processors, and a bus arbiter generating system responses to commands on the system bus. The first processor generates a first command associated with a real address, and the second processor generates a second command associated with a real address. When the first processor snoops the second command on the system bus, the first processor delays generating a snoop response to the second command until the system response to the first command is received. Based on the system response to the first command, the first processor generates a snoop response to the second command.
The objectives are additionally achieved by providing a multi-processor system including at least first and second processors, a system bus providing communication between the first and second processors and a bus arbiter generating system responses to commands on the system bus. The first processor has at least one level of cache associated therewith, a system bus controller controlling communication between the first processor and the system bus, and a transition cache serving as an interface between each level of cache and the system bus controller. When the first processor snoops a first command on the system bus requesting invalidation of a cache line, each level of cache associated with the first processor invalidates the cache line prior to the first processor snooping a system response to the first command. Additionally, if the memory coherency image state for the cache line in the one of the caches indicates that modified data is included therein, then prior to invalidating the cache line, the cache generates a castback command and transfers the castback command and a copy of the cache line to the transition cache. If the system response to the first command is a retry, then the transition cache converts the castback command to a second command which requests that the cache line be stored in a main memory of the multi-processor system. If a non-retry system response to the first command is received, then the castback command is discarded by the transition cache.
The objects are also achieved by providing a multi-processor system having at least first and second processors, a system bus providing communication between the first and second processors, and a bus arbiter generating system responses to commands on the system bus. The first processor includes at least a level one cache, a system bus controller controlling communication between the first processor and the system bus, and a transition cache controlling and tracking communication between each level of cache and the system bus controller. The system bus controller checks a reservation of a first command, which requires a reservation, generated by the level one cache prior to placing the first command on the system bus. If the reservation has been lost, then the system bus controller converts the first command into a second command, which does not require a reservation, and places the second command on the system bus.
The objects of the present invention are still further achieved by providing a multi-processor system including at least first and second processors, a system bus providing communication between the first and second processors, and a bus arbiter generating system responses to commands on the system bus. The first processor has at least one level of cache associated therewith, a system bus controller controlling communication between the first processor and the system bus, and a transition cache controlling and tracking communication between each level of cache and the system bus controller. The transition cache determines whether data has started to arrive at the transition cache in response to a non-exclusive command when the first processor snoops a command on the system bus associated with the same real address as the non-exclusive command. Based on whether data has started to arrive at the transition cache, the transition cache generates a snoop response to the snooped command.
Additional objects of the present invention are achieved providing a multi-processor system including at least first and second processors, a system bus providing communication between the first and second processors, and a bus arbiter generating system responses to commands on the system bus. The first processor has at least one cache associated therewith, a system bus controller controlling communication between the first processor and the system bus, and a transition cache controlling and tracking communication between each cache and the system bus controller. When the first processor receives a first command on the system bus requesting a cache line, one of the caches associated with the first processor that stores the requested cache lines copies the requested cache line to the transition cache as part of a response to the first command. Each cache associated with the first processor that stores the requested cache line then updates the memory coherency image state associated with the requested cache line prior to snooping a system response to the first command. Then, once the first processor snoops the system response on the system bus to the first command, the requested cache line is processed at the first processor based on the system response.
Other objects, features, and characteristics of the present invention; methods, operation and functions of the related elements of the structure; combination of parts; and economies of manufactural will become apparent from the following detailed description of the preferred embodiments and accompanying drawings, all of which form a part of the specification, wherein like reference numerals designate corresponding parts in the various figures.