This invention relates generally to computing systems with multiple shared memory resources and more particularly, to methods for increasing the exchange rate between main memories and cache memories in multiple central process unit (CPU) computing systems.
As is known in the art, complex computing systems, in particular systems which have multiple CPUs, may have some of what are known as xe2x80x9cdirtyxe2x80x9d cache entries. A cache entry, or piece of data, is known as a xe2x80x9cdirtyxe2x80x9d entry if it has been modified since the time it was fetched from the main memory to the cache. This means that the xe2x80x9cdirtyxe2x80x9d cache entry has a different value from the original data fetched from the main memory, due to an action in the CPU, typically an arithmetic operation. Thus the information in the cache memory has been modified by some step in the computation process and the original data entry in the main memory is no longer compatible with the newly calculated value for this particular piece of data in the cache, i.e., what is known as a xe2x80x9cdirtyxe2x80x9d cache value. By contrast, a xe2x80x9ccleanxe2x80x9d cache value is a memory value that has not been modified by the CPU, typically an instruction or a reference data value. Thus, the original data value in the main memory needs to be updated to equal the modified or xe2x80x9cdirtyxe2x80x9d cache entry. This is typically done by a xe2x80x9cwrite-backxe2x80x9d command, also known as xe2x80x9cretiring the victimxe2x80x9d. In a xe2x80x9cwrite-backxe2x80x9d, the modified or xe2x80x9cdirtyxe2x80x9d cache entry is written back into the main memory location from which it was initially fetched, thus updating the data value. After the xe2x80x9cdirtyxe2x80x9d cache entry has been rewritten back into the main memory, (i.e., the main memory location for that particular value has been updated to the new value) the memory is said to be xe2x80x9ccoherentxe2x80x9d (i.e., there are not multiple versions of the same data value in the computer system), and the computer system memory is said to be maintaining its xe2x80x9ccoherencyxe2x80x9d.
During the time period in which this xe2x80x9cdirtyxe2x80x9d cache entry value is waiting to be written back to the main memory it is necessary to prevent the CPU from xe2x80x9cwriting overxe2x80x9d the xe2x80x9cdirtyxe2x80x9d cache entry with a different piece of information from a different location in the main memory. Such a different piece of data may be required to continue the progress of the program being run by the CPU. For example, if a new piece of information is fetched from some portion of the main memory and placed into the particular cache memory entry location that currently has the xe2x80x9cdirtyxe2x80x9d information, this is known as being xe2x80x9crun overxe2x80x9d by an xe2x80x9cimpending fillxe2x80x9d. Note that the xe2x80x9crun overxe2x80x9d xe2x80x9cdirtyxe2x80x9d data can no longer be used to properly update the main memory. A CPU cache block that is displaced or xe2x80x9crun overxe2x80x9d by an xe2x80x9cimpending fillxe2x80x9d is known as a xe2x80x9cvictimxe2x80x9d. Another way of looking at this is to note that the xe2x80x9cimpending fillxe2x80x9d is the new data that will be stored at the cache memory location, and the xe2x80x9cvictimxe2x80x9d is the old data that was previously stored at the cache location, and needs to be rewritten into the main memory location from which it was originally fetched in order to keep the data in the memory up to date.
As is known in the art, any potential xe2x80x9cvictimxe2x80x9d may be exchanged with the xe2x80x9cimpending fillxe2x80x9d data on the bus system, with the xe2x80x9cvictimxe2x80x9d data then directed back to the original portion of the main memory from which it was initially fetched, thus rewriting the corrected data back into the original main memory location. This system of exchanging data works well in computing systems using bus lines to connect the CPU or CPUs to the main memory or memory modules.
A problem with exchanging data in computing systems that use bus lines is that the period of time required to wait for the main memory (generally composed of Dynamic Random Access Memories (i.e., DRAMs)) to access the correct memory location and to ship the exchanged information on the bus, known as the xe2x80x9clatencyxe2x80x9d period, reduces the operational speed of the system. Thus, the sequence of events on a typical bus system might be:
1. The exchange command, possessing the address of the xe2x80x9cfillxe2x80x9d data in the main memory and the address of the xe2x80x9cvictimxe2x80x9d cache data which will be run over, is sent out over the bus line;
2. The xe2x80x9cvictimxe2x80x9d, i.e., the xe2x80x9cdirtyxe2x80x9d data, and its address, show up on the bus, which then writes the main portion of the exchange transfer back into the memory address;
3. The main memory provides the new xe2x80x9cfillxe2x80x9d data.
The above sequence of events slows down the overall functional speed of the system. In other words the xe2x80x9clatencyxe2x80x9d period is increased. This situation of high exchange xe2x80x9clatencyxe2x80x9d cannot be avoided, because as noted above, it is important to maintain xe2x80x9cdata coherencyxe2x80x9d, i.e., not have multiple versions of the xe2x80x9csamexe2x80x9d data value in a computer.
The above-mentioned situation with system coherency becomes even more serious, as compared to the bus type system discussed above, in what is known as a xe2x80x9ccrossbar switchxe2x80x9d type system. A xe2x80x9ccrossbar switchxe2x80x9d is a circuit which connects any of a series of CPUs or other data users (known as commanders) to any of a series of memory resources in an arbitrary fashion or in a fashion dictated by the program. Any one of the data users can attach at any time to any one of the memory resources. This type of arrangement is faster than the bus system used in the prior art, because each CPU and each memory has what is known as a xe2x80x9chard linkxe2x80x9d with each of the other units. Data values are not simply dumped onto a bus with hopes that they arrive at the desired location without a collision with another data value from another one of the CPUs. Rather, the commander is connected directly to the specific memory resource containing the data value needed and no other commander may have access to that memory resource during the time of interconnection. This is results in what is known as having a wider data transmission bandwidth. With a crossbar switch, the data transmission bandwidth may be the sum of all the individual parts. In other words, in a four processor computer system, a crossbar switch may be four times faster than the individual serial-port bandwidth of an equivalent bus. All four CPUs may be connected to a different one of the memory resources at the same time.
A problem with a bus type computing system, as noted above, is that two or more memory data user elements (commanders), or memory resource elements may be trying to xe2x80x9cwritexe2x80x9d data onto the same bus at the same time. This results in what are known as xe2x80x9ccontentionsxe2x80x9d or xe2x80x9ccollisionsxe2x80x9d between the multiple commanders and memory units, as each of these users and memories compete for access to what is in essence a single communication resource. The need to detect xe2x80x9ccollisionsxe2x80x9d, and to use an arbitor chip to resolve the xe2x80x9ccollisionsxe2x80x9d and xe2x80x9ccontentionsxe2x80x9d, contributes to the lack of speed in bus systems, particularly bus systems that have large numbers of commanders (or data users) and bystanders (or data resources) connected to them. Typically, arbitors resolve collisions by notifying each of the two contending commanders or bystanders that there was a xe2x80x9ccollisionxe2x80x9d, i.e., that the data did not get to its intended location, and ordering each of the contenders to step back and wait for a random period of time before attempting to access the bus again. Clearly, time is lost when the data does not get to its intended destination, and the random waiting period required to decrease the probability of another collision between the same two contenders also represents lost time.
Another way of looking at this problem is to say that a bus type system is limited to some maximum serial bandwidth, whereas, a crossbar switch has the ability to move data in a parallel fashion and thus is the sum of the parts of all of the multiple data paths of the individual serial port bandwidths.
For example, in a computer system having four CPUs, or commanders, four main memory modules, and a crossbar switch, there would exist only a one-in-four chance that the cache xe2x80x9cvictimxe2x80x9d data""s original memory address and the new xe2x80x9cfillxe2x80x9d data""s memory address happen to be from the same main memory module. Recalling that with a crossbar switch one specific CPU is hardwired to the specific memory module from which it was receiving data, then the problem is clear that the xe2x80x9cfillxe2x80x9d data (i.e., the new data coming in from the main memory and likely to run over the xe2x80x9cdirtyxe2x80x9d cache data) is likely (i.e., 75%) to be from a different physical memory module than the xe2x80x9cdirtyxe2x80x9d data was from. Thus, since the crossbar switch has now attached a specific CPU to a specific memory module that is likely to not be the memory module from which the xe2x80x9cvictimxe2x80x9d data came, then there exists a problem in writing the xe2x80x9cvictimxe2x80x9d data back to the correct memory location, and a memory-coherency problem may result. Therefore in a typical crossbar switch system, there must be some possibility of a delay in the xe2x80x9cfillxe2x80x9d command until the potential xe2x80x9cvictimxe2x80x9d data can be written to the main memory.
Another known problem with a crossbar switch type of computing system is that the main memory module to which the xe2x80x9cdirtyxe2x80x9d data or xe2x80x9cvictimxe2x80x9d data is to be rewritten may be in the process of being accessed at that same time by another CPU, and thus the memory access port will be busy with other xe2x80x9cfillsxe2x80x9d to other CPUs. In this case the possibility exists that the xe2x80x9cvictimxe2x80x9d data may remain buffered in the crossbar switch until such time as the memory module port to which the xe2x80x9cvictimxe2x80x9d memory is addressed is no longer busy.
The two above enumerated crossbar switch problems may lead to catastrophic data coherency problems. If another CPU accesses the outmoded (i.e.,incoherent) data from the memory BEFORE the rewrite of the xe2x80x9cvictimxe2x80x9d data can occur, then the CPU receives xe2x80x9cstalexe2x80x9d data. This results in incorrect data being used, or even system crashes. Improperly updated data being used by another portion of a multiple CPU system due to long xe2x80x9clatencyxe2x80x9d period for rewriting xe2x80x9cvictimxe2x80x9d data into the main memory resource is a major performance and reliability problem for multiple processor computer systems.
In accordance with the invention, a computing system includes multiple central processing units (CPUs) and multiple memory resources connected by a crossbar switch wherein the interleaving bits and the row address in the main memory address of the xe2x80x9cvictimxe2x80x9d data and the xe2x80x9cfillxe2x80x9d data are common to one another. This is accomplished by mapping data bits such that the xe2x80x9cindexxe2x80x9d portion of the data match, thus assuring that any new xe2x80x9cfillxe2x80x9d to a particular CPU cache location goes to a cache element having data from the same memory resource as the new xe2x80x98fillxe2x80x99 data. Performing this mapping operation guarantees that both the xe2x80x9cfillxe2x80x9d data and the xe2x80x9cvictimxe2x80x9d data are from the same memory module and thus, there is no xe2x80x9clatencyxe2x80x9d between the xe2x80x9cfillxe2x80x9d command and the xe2x80x9cwritebackxe2x80x9d command for the xe2x80x9cvictimxe2x80x9d data since the xe2x80x9cfillxe2x80x9d command establishes a hardwire connection between the CPU and the memory module. Thus the xe2x80x9cvictimxe2x80x9d rewrite command requires no extra delay waiting for a connection to the correct main memory element, and improved data coherency and transmission bandwidth result.