1. Related Applications
The present application discloses certain aspects of a computing system that is further described in the following U.S. patent applications filed concurrently with the present application: Evans et al., AN INTERFACE BETWEEN A SYSTEM CONTROL UNIT AND A SERVICE PROCESSING UNIT OF A DIGITAL COMPUTER, Ser. No. 07/306,325 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,146,564 on Sept. 8, 1992; Arnold et al., METHOD AND APPARATUS FOR INTERFACING A SYSTEM CONTROL UNIT FOR A MULTIPROCESSOR SYSTEM WITH THE CENTRAL PROCESSING UNITS, Ser. No. 07/306,837 filed Feb. 3, 1989; Gagliardo et al., METHOD AND MEANS FOR INTERFACING A SYSTEM CONTROL UNIT FOR A MULTI-PROCESSOR SYSTEM WITH THE SYSTEM MAIN MEMORY, Ser. No. 07/306,326 filed Feb. 3, 1989; D. Fite et al., METHOD AND APPARATUS FOR RESOLVING A VARIABLE NUMBER OF POTENTIAL MEMORY ACCESS CONFLICTS IN A PIPELINED COMPUTER SYSTEM, Ser. No. 07/306,767 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,125,083 on Jun. 23, 1992; D. Fite et al., DECODING MULTIPLE SPECIFIERS IN A VARIABLE LENGTH INSTRUCTION ARCHITECTURE, Ser. No. 07/307,347 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,148,528 on Sept. 15, 1992; D. Fite et al., VIRTUAL INSTRUCTION CACHE REFILL ALGORITHM, Ser. No. 07/306,831 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,113,515 on May 12, 1992; Murray et al., PIPELINE PROCESSING OF REGISTER AND REGISTER MODIFYING SPECIFIERS WITHIN THE SAME INSTRUCTION, ser. No. 07/306,833 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,167,026 on Nov. 24, 1992; Murray et al., MULTIPLE INSTRUCTION PREPROCESSING SYSTEM WITH DATA DEPENDENCY RESOLUTION FOR DIGITAL COMPUTERS, Ser. No. 07/306,773, and issued as U.S. Pat. No. 5,142,631 on Aug. 25, 1992; Murray et al., PREPROCESSING IMPLIED SPECIFIERS IN A PIPELINED PROCESSOR, Ser. No. 07/306,846 filed Feb. 3, 1989, and issued U.S. Pat. No. 5,142,633 on Aug. 25, 1992; D. Fite et al., BRANCH PREDICTION, Ser. No. 07/306,760 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,142,634 on Aug. 25, 1992; Fossum et al., PIPELINED FLOATING POINT ADDER FOR DIGITAL COMPUTER, Ser. No. Ser. No. 07/306,343 filed Feb. 3, 1989, and issued as U.S. Pat. No. 4,994,996 on Feb. 19, 1991; Grundmann et al., SELF TIMED REGISTER FILE, Ser. No. 07/306,445 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,107,462 on Apr. 21, 1992; Beaven et al., METHOD AND APPARATUS FOR DETECTING AND CORRECTING ERRORS IN A PIPELINED COMPUTER SYSTEM, Ser. No. 07/306,828 filed Feb. 3, 1989, and issued as U.S. Pat. No. 4,982,402 on Jan. 1, 1991; Flynn et al., METHOD AND MEANS FOR ARBITRATING COMMUNICATION REQUESTS USING A SYSTEM CONTROL UNIT IN A MULTI-PROCESSOR SYSTEM, Ser. No. 07/306,871 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,155,854 on Oct. 13, 1992; E. Fite et al., CONTROL OF MULTIPLE FUNCTION UNITS WITH PARALLEL OPERATION IN A MICROCODED EXECUTION UNIT, Ser. No. 07/306,832 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,067,069 on Nov. 19, 1991; Webb, Jr. et al., PROCESSING OF MEMORY ACCESS EXCEPTIONS WITH PRE-FETCHED INSTRUCTIONS WITHIN THE INSTRUCTION PIPELINE OF A VIRTUAL MEMORY SYSTEM-BASED DIGITAL COMPUTER, Ser. No. 07/306,866 filed Feb. 3, 1989, and issued as U.S. Pat. No. 4,985,825 on Jan. 15, 1991; Hetherington et al., METHOD AND APPARATUS FOR CONTROLLING THE CONVERSION OF VIRTUAL TO PHYSICAL MEMORY ADDRESSES IN A DIGITAL COMPUTER SYSTEM, Ser. No. 07/306,564 filed Feb. 3, 1989; Flynn et al., METHOD AND MEANS FOR ARBITRATING COMMUNICATION REQUESTS USING A SYSTEM CONTROL UNIT IN A MULTI-PROCESSOR SYSTEM, Ser. No. 07/306,871 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,155,854 on Oct. 13, 1992; Chinnasway et al., MODULAR CROSSBAR INTERCONNECTION NETWORK FOR DATA TRANSACTIONS BETWEEN SYSTEM UNITS IN A MULTI-PROCESSOR SYSTEM, Ser. No. 07/306,336 filed Feb. 3, 1989, and issued as U.S. Pat. No. 4,968,977 on Nov. 6, 1990; Polzin et al., METHOD AND APPARATUS FOR INTERFACING A SYSTEM CONTROL UNIT FOR A MULTI-PROCESSOR SYSTEM WITH INPUT/OUTPUT UNITS, Ser. No. 07/306,862 filed Feb. 3, 1989, and issued as U.S. Pat. No. 4,965,793 on Oct. 23, 1990; Gagliardo et al., MEMORY CONFIGURATION FOR USE WITH MEANS FOR INTERFACING A SYSTEM CONTROL UNIT FOR A MULTI-PROCESSOR SYSTEM WITH THE SYSTEM MAIN MEMORY, Ser. No. 07/306,404 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,043,874 on Aug. 27, 1991; Gagliardo et al., METHOD AND MEANS FOR ERROR CHECKING OF DRAM-CONTROL SIGNALS BETWEEN SYSTEM MODULES, Ser. No. 07/306,836 filed Feb. 3, 1989, now abandoned; Hetherington et al., METHOD AND APPARATUS FOR INCREASING THE DATA STORAGE RATE OF A COMPUTER SYSTEM HAVING A PREDEFINED DATA PATH WIDTH, Ser. No. 07/306,826 filed Feb. 3, 1989, and issued as U.S. Pat. No. 5,019,965 on May 28, 1991; and Hetherington et al., METHOD AND APPARATUS FOR ORDERING AND QUEUING MULTIPLE MEMORY REQUESTS, U.S. Ser. No. 07/306,870 filed Feb. 3, 1990.
2. Field of the Invention
This invention relates generally to cache-based multi-processor systems. More particularly, this invention relates to an improved technique for insuring data consistency between the main memory and the individual processor cache memories in a multi-processor computer system.
3. Description of the Related Art
Cache memories are commonly used in high-performance computer systems in order to optimize the ratio of system memory to processor speed. Typically implemented in the form of small, high-speed buffer memories, caches continually obtain and temporarily retain data (typically, the most recently used instructions and data items) that associated system processors are likely to require in executing current operations. The main memory of a computer system is generally accessed in a logical order and often in a sequential fashion. Typical examples include the processing of array structures and the sequencing of instructions in executing a particular program. Alternatively, a program may repeatedly execute an instruction loop prior to transferring control to a localized area. In both these cases, a substantial increase in the execution speed of the individual processes, and consequently the overall computer system, can be achieved if an auxiliary memory is provided which is capable of retaining sufficient data to avoid repeated references to the slower system memory; caches associated with system memory provide this function.
In typical cache implementations, the cache memory resides between a processor and the system memory (the primary or main memory). Memory addresses are interpreted by using an associative memory map which defines a correspondence between requested address locations and the cache contents. If a requested data item exists within the cache, requests to main memory are inhibited by the associative memory and the desired data is supplied from the cache to the requesting processor. The system memory is accessed only when a requested data item is not located within the cache; in such a case, the required data is fetched from the system memory and then supplied to the requesting processor. The operation of such cache memory schemes is based upon the phenomenon of locality exhibited by programs in the generation of addresses and memory usage. In essence, cache memories provide a window into the system memory for associated processors and permit high-speed access to data references with both spatial and temporal locality.
However, because caches duplicate data items that exist in the system memory, it is critical that data consistency be maintained between the system memory and the various cache memories of the system. When individual processors are provided with separate, individual cache memories, the caches may hold different versions of shared data and steps must be taken to update all such differing versions. In addition, when a particular processor modifies information within its cache, the revised information needs to be replaced in the main memory in order that the various caches and the main memory always hold valid copies of stored data.
Cache consistency has been approached through several techniques, including the use of "smart" memories, and the more popular software control. Bus-based consistency schemes, for instance, utilize a common path to main memory which is shared by all system processors and is premised on the detection of data inconsistencies by making each cache maintain an updated directory of transactional data information by monitoring, via a common bus, misses and writes-to-memory by all other caches. The common bus approach, in addition to being fairly complex and requiring special VLSI chips, is operable only with a limited number of processors because of bottleneck problems arising from the transfer of extended amounts of data over the shared bus.
Software-controlled consistency schemes are increasing being used in multi-processor systems and are predicated on the use of system-controlled microcode which tracks the areas of memory that are shared and relays commands to the various processor caches to make sure that shared data is kept consistent. The commands could, for example, initiate the purging of a processor cache if shared memory is found to have been modified by another processor. In these schemes cache synchronization is critical and it is possible for system speed to be unduly restricted if the purging of processor caches is required too often.
The two most commonly employed techniques in cache consistency schemes are the "write-back" method and the "write-through" method. With the write-back method, modified or new data is written only to individual caches and not to main memory. Each cache tag has associated with it an additional bit that is set whenever the associated cache entry is modified. When the modified data is to be replaced in the cache, its value is written back to the main memory only if the identifying bit is found to have been set. While this method is quite efficient because it requires writing to main memory only when data items within individual caches are modified, it does require continual monitoring and updating of memory resident data and can be problematic when the main memory is shared among several processors. The problems are exacerbated in systems where processors are permitted to write to segments of caches without refilling from main memory the complete block containing the segment being written.
With the write-through method, data consistency is insured by immediately propagating all cache writes to main memory and by provision of a shared processor-cache interface that is accessible to all system processors. However, this method can result in a substantial deterioration of cache performance because unnecessary writes to main memory are required if a plurality of writes occurs consecutively.
A common problem associated with most conventional cache consistency schemes is that the acceptance of data transaction requests is contingent on the current status of shared memory. In other words, an incoming data request from a processor or other system unit is not accepted or executed unless it is established that the request will not generate data inconsistencies. This restriction can be quite limiting in high-performance systems where it is essential that all incoming process requests be scheduled for execution regardless of whether or not a particular request could cause an inconsistency between data shared by the processor caches and the main memory.