This invention relates generally to computing systems with multiple shared memory resources and more particularly, to methods for increasing the exchange rate between main memories and cache memories in multiple central process unit (CPU) computing systems.
As is known in the art, complex computing systems, in particular systems which have multiple CPUs, may have some of what is known as "dirty" cache entries. A cache entry, or piece of data, is known as a "dirty" entry if it has been modified since the time it was fetched from the main memory to the cache. This means that the "dirty" cache entry has a different value from the original data fetched from the main memory, due to an action in the CPU, typically an arithmetic operation. Thus the information in the cache memory has been modified by some step in the computation process and the original data entry in the main memory is no longer compatible with the newly calculated value for this particular piece of data in the cache, i.e., what is known as a "dirty" cache value. By contrast, a "clean" cache value is a memory value that has not been modified by the CPU, typically an instruction or a reference data value. Thus, the original data value in the main memory needs to be updated to equal the modified or "dirty" cache entry. This is typically done by a "write-back" command, also known as "retiring the victim". In a "write-back", the modified or "dirty" cache entry is written back into the main memory location from which it was initially fetched, thus updating the data value. After the "dirty" cache entry has been rewritten back into the main memory, (i.e., the main memory location for that particular value has been updated to the new value) the memory is said to be "coherent" (i.e., there are not multiple versions of the same data value in the computer system), and the computer system memory is said to be maintaining its "coherency".
During the time period in which this "dirty" cache entry value is waiting to be written back to the main memory it is necessary to prevent the CPU from "writing over" the "dirty" cache entry with a different piece of information from a different location in the main memory. Such a different piece of data may be required to continue the progress of the program being run by the CPU. For example, if a new piece of information is fetched from some portion of the main memory and placed into the particular cache memory entry location that currently has the "dirty" information, this is known as being "run over" by an "impending fill". Note that the "run over" "dirty" data can no longer be used to properly update the main memory. A CPU cache block that is displaced or "run over" by an "impending fill" is known as a "victim". Another way of looking at this is to note that the "impending fill" is the new data that will be stored at the cache memory location, and the "victim" is the old data that was previously stored at the cache location, and needs to be rewritten into the main memory location from which it was originally fetched in order to keep the data in the memory up to date.
As is known in the art, any potential "victim" may be exchanged with the "impending fill" data on the bus system, with the "victim" data then directed back to the original portion of the main memory from which it was initially fetched, thus rewriting the corrected data back into the original main memory location. This system of exchanging data works well in computing systems using bus lines to connect the CPU or CPUs to the main memory or memory modules.
A problem with exchanging data in computing systems that use bus lines is that the period of time required to wait for the main memory (generally composed of Dynamic Random Access Memories (i.e., DRAMs)) to access the correct memory location and to ship the exchanged information on the bus, known as the "latency" period, reduces the operational speed of the system. Thus, the sequence of events on a typical bus system might be:
1. The exchange command, possessing the address of the "fill" data in the main memory and the address of the "victim" cache data which will be run over, is sent out over the bus line;
2. The "victim", i.e., the "dirty" data, and its address, show up on the bus, which then writes the main portion of the exchange transfer back into the memory address;
3. The main memory provides the new "fill" data.
The above sequence of events slows down the overall functional speed of the system. In other words the "latency" period is increased. This situation of high exchange "latency" cannot be avoided, because as noted above, it is important to maintain "data coherency", i.e., not have multiple versions of the "same" data value in a computer.
The above-mentioned situation with system coherency becomes even more serious, as compared to the bus type system discussed above, in what is known as a "crossbar switch" type system. A "crossbar switch" is a circuit which connects any of a series of CPUs or other data users (known as commanders) to any of a series of memory resources in an arbitrary fashion or in a fashion dictated by the program. Any one of the data users can attach at any time to any one of the memory resources. This type of arrangement is faster than the bus system used in the prior art, because each CPU and each memory has what is known as a "hard link" with each of the other units. Data values are not simply dumped onto a bus with hopes that they arrive at the desired location without a collision with another data value from another one of the CPUs. Rather, the commander is connected directly to the specific memory resource containing the data value needed and no onther commander may have access to that memory resource during the time of interconnection. This is results in what is known as having a wider data transmission bandwidth. With a crossbar switch, the data transmission bandwidth may be the sum of all the individual parts. In other words, in a four processor computer system, a crossbar switch may be four times faster than the individual serial-port bandwidth of an equivalent bus. All four CPUs my be connected to a different one of the memory resources at the same time.
A problem with a bus type computing system, as noted above, is that two or more memory data user elements (commanders), or memory resource elements may be trying to "write" data onto the same bus at the same time. This results in what are known as "contentions" or "collisions" between the multiple commanders and memory units, as each of these users and memories compete for access to what is in essence a single communication resource. The need to detect "collisions", and to use an arbitor chip to resolve the "collisions" and "contentions", contributes to the lack of speed in bus systems, particularly bus systems that have large numbers of commanders (or data users) and bystanders (or data resources) connected to them. Typically, arbitors resolve collisions by notifying each of the two contending commanders or bystanders that there was a "collision", i.e., that the data did not get to its intended location, and ordering each of the contenders to step back and wait for a random period of time before attempting to access the bus again. Clearly, time is lost when the data does not get to its intended destination, and the random waiting period required to decrease the probability of another collision between the same two contenders also represents lost time.
Another way of looking at this problem is to say that a bus type system is limited to some maximum serial bandwidth, whereas, a crossbar switch has the ability to move data in a parallel fashion and thus is the sum of the parts of all of the multiple data paths of the individual serial port bandwidths.
For example, in a computer system having four CPUs, or commanders, four main memory modules, and a crossbar switch, there would exist only a one-in-four chance that the cache "victim" data's original memory address and the new "fill" data's memory address happen to be from the same main memory module. Recalling that with a crossbar switch one specific CPU is hardwired to the specific memory module from which it was receiving data, then the problem is clear that the "fill" data (i.e., the new data coming in from the main memory and likely to run over the "dirty" cache data) is likely (i.e., 75%) to be from a different physical memory module than the "dirty" data was from. Thus, since the crossbar switch has now attached a specific CPU to a specific memory module that is likely do not be the memory module from which the "victim" data came, then there exists a problem in writing the "victim" data back to the correct memory location, and a memory-coherency problem may result. Therefore in a typical crossbar switch system, there must be some possibility of a delay in the "fill" command until the potential "victim" data can be written to the main memory.
Another known problem with a crossbar switch type of computing system is that the main memory module to which the "dirty" data or "victim" data is to be rewritten may be in the process of being accessed at that same time by another CPU, and thus the memory access port will be busy with other "fills" to other CPUs. In this case the possibility exists that the "victim" data may remain buffered in the crossbar switch until such time as the memory module port to which the "victim" memory is addressed is no longer busy.
The two above enumerated crossbar switch problems may lead to catastrophic data coherency problems. If another CPU accesses the outmoded (i.e., incoherent) data from the memory BEFORE the rewrite of the "victim" data can occur then the CPU receives "stale" data. This results in incorrect data being used, or even system crashes. Improperly updated data being used by another portion of a multiple CPU system due to long "latency" period for rewriting "victim" data into the main memory resource is a major performance and reliability problem for multiple processor computer systems.