Computer systems in general, and special purpose computer systems in particular, have been developed to maximize the through-put of data, as well as increase data integrity and overall data processing capability. One class of computer systems that is designed for these objectives is based on a computer system architecture which has a number of data processors, i.e., a multiprocessor architecture. This class of computer system architecture is categorized by the manner in which the multiple data processors communicate, which includes "loosely coupled" systems, "moderately coupled" systems, and "tightly coupled" systems. For example, a tightly coupled system employs a common or shared memory, such as a random access memory (RAM), for the storage of data, and a number of data processors that access the data stored in the shared memory. Communication, including the transfer of data, between the data processors and the shared memory, and among the data processors themselves, is performed via a bus structure ("bus") which carries control signals, addresses of blocks of data and the data.
To improve system performance, memory hierarchies are used in computer systems, e.g., in the form of cache or secondary memories in conjunction with the shared or main memory. Each data processor may have a cache memory which temporarily stores copies of the data that are being accessed by the processor. The system performance is improved because the copies of the data that are stored in the cache memory can be accessed by the data processor in much less time than if the same data had to be accessed from the shared memory.
A system design issue related to the use of memory hierarchies concerns "bus bandwidth" or bus utilization. In most computer systems, the bus is bandwidth limited, so that it is important to minimize the use of the bus by each data processor, particularly for a computer system having a large number of processors coupled to the bus. One technique for minimizing the utilization of the bus is based on a cache memory algorithm known as "non-write-through", as opposed to a "write-through" algorithm. Specifically, if data to be accessed by a given data processor are not present in the corresponding cache memory, a copy of the data is obtained from the shared memory and stored in the cache memory. Thereafter, all accesses (read and write) to this data are made by the data processor to the cache memory, until such time as this data and other data previously stored in the cache memory are not currently needed, and still other data not present in the cache memory have to be accessed. At this time, the data processor writes that data which had been modified by the processor while stored in the cache memory back to the shared memory. Data which had not been so modified need not be written back to the shared memory, but are merely invalidated in the cache memory, thereby making the corresponding storage location available for the storage of copies of other data accessed from the shared memory. The bus utilization is minimized, using the non-write-through algorithm, since the modified data are not transferred from the cache memory to the shared memory after every write access in the cache memory, but only periodically when the data are no longer being used and other data must be transferred to the cache memory. By comparison, in accordance with the write-through algorithm, the modified data are transferred to the shared memory after each write access in the cache memory, thereby increasing the bus utilization.
The use of a memory hierarchy introduces a problem known as "data coherency." A computer system is data coherent if the data that are accessed by a processor are always the data last written to the address of that data. The problem of data coherency is enhanced in computer systems employing a non-write-through algorithm.
For example, assume that a computer system has a shared memory and two data processors, each having a cache memory, and all of which are coupled together over a common bus. Also, assume that A is the address of data D that are currently stored only in the shared memory. Thereafter, assume, for example, that one data processor P.sub.1 has acquired a copy of the data D of that address A from the shared memory, modified the data D to data D' and stored data D' in its cache memory. Then assume the other data processor P.sub.2 acquires a copy of the data D from the shared memory to read the data D. The result will be a violation of data coherency, since, for example, upon a read access by the one processor P.sub.1 to its cache memory, the data D' will be read and upon a read access by the other processor P.sub.2 to its cache memory the data D will be read. The data coherency problem is enhanced when non-write-through is employed, since the cache memory of the processor P.sub.1 will continue to store data D' for a period of time, during which time the other processor P.sub.2 may access the stale data D from the shared memory and read that data D from its cache memory.
Several practical or commercial computer systems employing a memorv hierarchy have been developed and provide for data coherency. In one system, such as the UNIVAC 1100/80 Series, multiple data processors use a single shared cache memory. One problem with this technique is that the bandwidth of the single shared cache memory may not be sufficient to support a large number of data processors. In addition, longer access time delays are incurred, since the single shared cache memory cannot be physically close to all the data processors in the computer system.
In another type of practical computer system, such as the IBM 3033 Series manufactured by IBM Corporation, Armonk, N.Y., each data processor has its own cache memory. When a processor performs a write access to data D of an address A in its cache memory, the processor broadcasts the address A to all other processors. If the same address A is in one or more of the cache memories of these other processors, the corresponding data D in the cache memories are invalidated. One disadvantage with this type of computer system is the increase in bus utilization that is required to broadcast the address A over the bus each and every time such a write access occurs.
Yet in another type of practical computer system, such as the Honeywell Series 66, and the ELXSI 6400 Series, software control is used in an attempt to guarantee data coherency. A number of addresses of specified data, such as semaphores or job queues, are designated non-cacheable and can only be accessed from the shared memory. One disadvantage of the use of non-cacheable data is that the access time for the processor to access the non-cacheable data in the shared memory is substantially increased. An additional disadvantage to this technique is that the computer system, and in particular, the caching mechanism, is no longer transparent to the software.
Two other conceptual solutions to the data coherency problem have been proposed, and neither of these is believed to have been developed or commercialized. One solution is generally discussed in a paper entitled "A New Solution To Coherence Problems In Multicache Systems," by Censier and Feautrier, IEEE Transactions On Computers, Volume C-27, No. 12, December 1978. In this concept, the shared memory maintains flags for keeping track of individual blocks of data being processed throughout the system to prevent inconsistencies in the data. The flags that are used are called PRIVATE, PRESENT and MODIFIED and have the following properties: (1) if PRESENT is set in the shared memory for a block of data D and a cache memory K, then a valid copy of data D is in cache memory K; (2) if MODIFIED is set in the shared memory for the block of data D, then a valid copy of data D is stored in some cache memory and has been modified in that cache memory since the latest update of the shared memory; (3) if PRIVATE is set in a cache memory K for a valid block of data D, then no copies of data D are in any other cache memories, which implies that there is exactly one PRESENT flag set for that data D in the shared memory; and (4) if PRIVATE is reset in a cache memory K for a valid block of data D, then the data D in that cache memory K are identical to the data D in the shared memory, which implies that MODIFIED is reset for that data D.
As stated in Censier and Feautrier, the data access algorithms must be defined in such a way that the above four properties are always true, transition times being excepted. However, this exception presents a significant problem in terms of data coherency. That is, if the data access algorithms do not have to be true when given data D are in transit to one processor, i.e., on the bus, then, for example, a copy of this same data D which may be stored in a cache memory of another processor may be modified during this transit period by that other processor. The result is that the data D in transit may become stale and yet be accessed by the one processor. Moreover, another problem is the requirement that the shared memory keep track of all the data via the flags to maintain data coherency. This approach becomes infeasible for a computer system having a large number of processors, since an operation in a central location or central controller, i.e., the shared memory, is required, thereby resulting in substantial and complex hardware and algorithms for the central controller to perform the centralized control function, as well as system performance degradation.
Another publication entitled "Using Cache Memory To Reduce Processor-Memory Traffic," by James R. Goodman, Association for Computing Machinery, Tenth Annual Symposium on Computer Architecture, June, 1983, describes generally a concept for a multiprocessor computer system having memory hierarchy and data coherency schemes. Although Goodman is disclosed herein for background purposes and to help explain the present invention, this publication is not believed to be prior art. Goodman states that his approach has much in common with Censier and Feautrier, but allows the critical information for achieving data coherency to be distributed among the cache memories where it already resides. Furthermore, Goodman proposes a new scheme called "write-once" to solve the data coherency and bus bandwidth problems.
In Goodman, associated with each block of data D in a cache memory, in addition to addresses of the data D, are two bits defining one of four states for the associated data D, including (1) INVALID, (2) VALID, (3) RESERVED, and (4) DIRTY. If INVALID, there are no data D in the block; if VALID there are data D in the block which have been read by the corresponding processor from the shared memory but which have not been modified; (3) if RESERVED, the data D in the block have been locally modified by the processor exactly once since the data D were stored in the cache memory and the change has been transmitted to the shared memory; and if DIRTY the data D in the block have been modified by the processor more than once since the data D were stored in the cache memory and the latest change has not been transmitted to the shared memory. Also, an additional copy of addresses of the cache memory is contained in and employed by a given processor. One such copy is used in a conventional way to support accesses to the cache memory by the processor and the other such copy is used to monitor all accesses to shared memory via the bus by other processors.
For each access by another processor to shared memory, the one processor monitors the address on the bus to check if that address is in its other copy of addresses. If a match is found by the one processor on a write access to shared memory by the other processor, the corresponding data in the cache memory are marked INVALID by the one processor. If a match is found by the one processor on a read access to shared memory by the other processor, nothing is done by the one processor unless the data have been modified, i.e., its state is RESERVED or DIRTY. If so modified, and if the data are just RESERVED, the state bits are changed to VALID by the one processor. If DIRTY, the one processor inhibits the shared memory from supplying the data to the other processor requesting the data. The one processor then supplies the requested data to the other processor and thereafter writes this data to shared memory. In addition, the state bits are changed to VALID by the one processor.
According to Goodman, data coherency is achieved in the following way. Initially, upon the other processor writing through on the bus for a write access to shared memory, only this other processor is guaranteed to have a copy of the data, except for the shared memory, since the one processor (and still all other processors in the system) will mark the data INVALID while the other processor will mark the data RESERVED. Thereafter, if another write access occurs for this data, such other processor will change the flag from RESERVED to DIRTY.
Thus, in accordance with Goodman, each processor is responsible for maintaining data coherency for those cases where a violation can occur, i.e., whenever a write access is made to a given address, thereby distributing the data coherency function. One problem with Goodman, is that the processor having given data D can only modify that data once and then must write that data back to shared memory (write once), thereby increasing bus utilization. Furthermore, being conceptual, Goodman does not solve the data coherency problem that arises under a number of different conditions in a practical or fully developed computer system.