The advent of parallel processing in MP systems has resulted in the potential for a substantial increase in performance over traditional uniprocessor systems. Numerous processors in an MP system can simultaneously communicate in parallel to memory via a multistage interconnection network (MIN), which have been known in the art for decades.
More specifically, in a usual MIN configuration, the processors are connected to a unique port of the MIN. A conventional MIN has stages of controllable switches. By way of the controllable switches, the MIN can channel one or more of the memory lines to any of the processors at any given time. Essentially, the MIN can permit several processors to communicate simultaneously to the memory, thus facilitating true parallel processing.
However, as more processors are added to an MP system and as the speed of processors is continuously enhanced in the art, the main memory bandwidth has not been able to keep pace with the demands imposed by the numerous high performance processors. More specifically, the memory access time for processors generally increases as the main memory is situated further away from processors and also as more and more processors fight for access to the main memory. Thus, the main memory bandwidth has transformed into the primary bottleneck for high performance data processing in an MP system.
In order to alleviate this bottleneck, a cache memory can be associated with a processor to reduce the memory access time for the processor. Cache memories are well known in the art. A cache memory is a high speed, hardware-managed buffer which is virtually transparent to computer programming. A cache memory comprises a data array having cache data lines, which is the basic unit of data transfer between the main memory and the cache, and a directory for mapping the data address to the location of the data within the data array. Substantively, cache data lines could be either instructions or actual data. Further, the cache is typically an order of magnitude faster than the main memory, and usually matches the high speed of its associated processor.
Performance is enhanced by a cache associated with a processor by taking advantage of the program structure being executed in the associated processor. Many instructions in an instruction set within the program are repetitious. The cache can be filled with cache lines, which can sustain the processor's need for data words and instructions over a time period, before a refill of cache lines is needed. In other words, processors request data words (e.g., words, dwords, or bytes), which are much smaller than cache data lines. Moreover, if a data word is sought in a cache by a processor and the data word is found in a cache data line, then a cache "hit" is said to have occurred. If a data word is sought in a cache by a processor and the data word is not found in any cache data line, then a cache "miss" is said to have occurred, and accordingly, a refill of a cache line is sought. In a sense, the cache serves as a large buffer between the processor and the main memory.
In an MP system with many processors which share memory space, or have a global memory, the MP system must maintain "coherency", or consistency, among all data in the shared memory space. Data could exist in several different locations including in the main memory and in other remote memory locations, such as in caches.
"Coherency" refers to the concept in which each processor must have access to the latest data corresponding to a particular address in the shared memory. In other words, if a data word at a certain address is simultaneously shared by one or more caches and/or the main memory, then as the data word is updated or changed in one of the foregoing memory locations, the latest version of the data word must be identified and available to all of the processors in order to maintain data consistency. Note that in this document, "data" refers to any information stored in memory, including instructions or actual, processed or unprocessed data.
In order to maintain coherency, both software and hardware approaches are conventionally employed. Moreover, the hardware approaches can generally be divided into two types: bus-based ("snoopy") protocols and directory-based protocols. Bus-based protocols are used in MP systems with a relatively small number of processors, whereas directory-based protocols are used in larger MP systems with improved scalability. Because the latest trend is towards the use of many processors for parallel processing which has led to common use of MINS, the trend in terms of protocols is towards the use of directory-based protocols.
With respect to the directory-based protocols, "cross interrogations" are periodically performed among the caches during the operation of the MP system to insure coherency, or consistency, among the data. Cross interrogations may be implemented using any of a number of different protocols. Typically, cross interrogations involve the transfer of cache lines and/or the manipulation of control bits in the directories of the caches.
The protocol implemented for the cross interrogations depends, in large part, on the types of caches used in the MP system. Conventionally, caches have been classified historically as either "write-thru" (WT) or "write-back" (WB). Further, pursuant to current design, some caches have the ability to treat data in either fashion if controlled properly.
In WT caches, a data word is "written through" to the main memory upon each update or change of the data word in the cache line by any processor. Accordingly, the most current data is always in the main memory.
In a WB cache, a data word is written from the WB cache to the main memory only when it is requested by a remote source or when it is replaced in the cache. Consequently, a local processor can change data words in a local WB cache many times without other memory locations in the MP system knowing of or being interrupted by the changes.
When WB caches are used in an MP system having a MIN, a global directory can be employed, which is well known in the art. The global directory is associated with the main memory in order to aid in maintaining coherency. The global directory primarily contains information used to determine the global state of a cache data line-as well as the number and/or location of the caches having a copy of the cache line. In this regard, see M. Dubois and F. A. Briggs, "Effects of Cache Coherency in Multiprocessors," IEEE Transactions on Computers, vol. C-31, no. 11, November 1982 and A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, "An Evaluation of Directory Schemes for Cache Coherence", Proceedings of the 1988 International Symposium on Computer Architecture, pp. 280-289, 1988.
While some work has been performed in the art in regard to directory-based protocols, few practical designs are available for use. Moreover, the available conventional protocols are problematic. Each time that a cross interrogation is initiated, any processor using a cache must temporarily wait while an inquiry is made for the data word in the cache. Consequently, the performance of processors is compromised because of the cache inquiries. Furthermore, as more processors are added to the MP system, a higher number of cross interrogations must take place. Consequently, more interactions must occur with the caches, resulting in much expended time, and the interconnection network of the MP system is characterized by heavy traffic. Accordingly, in a broad sense the numerous requisite cross interrogations reduce the number of processors conducting useful work in the MP system.