In the operation of the standard computer system, a processor is used to execute a series of instructions known as the program and these instructions are generally stored in a main memory. Additionally, the processor performs various mathematical calculations with the data which it retrieves from and which it stores in main memory, wherein the main memory is generally placed on a separate circuit board in the system. FIG. 6 shows a generalized form of a typical architecture of a computer system having processor cards and memory cards which are linked by one or more system busses. Thus, both the first processor 310 and the second processor 320, connect to a pair of system busses, system bus A 340 and system bus B 350 which busses also connect to the main memory modules 380 and 390.
Normally, the accessing of main memory is relatively slow compared to the time it takes for the processor to execute an instruction. The processor must generally wait for the completion of either a memory Read or a memory Write operation and thus no other useful work can be done while the processor is sitting there waiting. It is advantageous in such systems to have a cache memory which can store a subset of the data normally found in the main memory whereby the cache can provide data access, via Read and Write operations, much faster for the processor than the main memory can. This is the case since the following conditions are generally present:
FIG. 7 indicates the modified architecture of a computer system which includes the implementation of a cache memory. Thus, a first processor 410 has a cache 460, a second processor 420 has a cache 470, and the dual system busses 440 and 450 connect to the main memory modules 480 and 490.
Even though a cache memory can be on the order of 1/1000th the size of a main memory, it still relies on the situation that data words and program words are repeatedly accessed after they have been used for the first time by the processor. Thus, by keeping the most recently used words in the cache memory, the processor would have direct access to the cache and would only have to access main memory a small percentage of the time, possibly only the order of 5% of the time. The remaining 95% of the time, the data could be found in the cache memory. This percentage is commonly known as the "hit ratio" of the cache. Now if the processor can save a few clock cycles with every cache "hit", it can readily be understood that the performance of the system will be generally increased.
An example of the use of a processor chip in a system is that typically used in the Unisys computer system designated as the A11-411 which uses a processor chip designated as the "IMP" (Integrated Mainframe Processor). This IMP processor chip contains a "code buffer" that can store two code words which the processor may need to execute in the next few moments of time. When one of these code words is taken from the code buffer by the processor, then a FETCHCODE signal is generated so that then a new code word can be fetched and placed into the newly vacated code buffer location. The FETCHCODE is to quickly get the next codeword in a sequence, as for example, addresses 100 and 101. The READCODE is needed for a branch operation when the next code word is not the address immediately after the previous code word, for example, addresses 100 and 200.
Since the computer program is usually executed in a sequential order according to consecutive memory locations, an incrementation register is often used to generate the FETCHCODE addresses and this register would generally be incremented after every code fetch.
If there is a branch to another area of the program material, the incrementation register (often called Program Word Address Register, PWAR) will be loaded with the new address and will then be incremented from that point onward. The re-loading of this "program word address register" (PWAR) occurs during an operation called the READCODE operation. After the READCODE operation is completed, the register (PWAR) is "incremented" so that it holds the value of the potential "next address" in the code sequence.
The register designated as the Program Word Address Register (PWAR) resides in the processor logic. However, another copy of this PWAR register is also found in the cache memory in order that the processor does not have to send a comparable address to the cache memory when a code buffer location in the processor logic is empty. The processor merely sends a FETCHCODE signal to the cache memory and then the cache memory will use its own duplicate program word address register (PWAR) to determine the address from which the next used code will be fetched.
This type of arrangement is shown in FIG. 8. As seen in FIG. 8, a computer system involving a processor 20 with its cache unit 40 operates over a system bus 340 to the main memory 380. The data words in the cache data unit 46 are transferred on the line through transceiver 30.sub.x over to the code buffers 21 and 22 for use of the processor logic 25. The processor logic 25 has a program word address register 24.
Likewise, the cache memory 40 has a program word address register 44 which is loaded and incremented by the state machine 45 which also provides an input to the cache data queue 46.
Typically here, a branch in code execution would result in a READCODE operation and the next group of code data accesses would be to consecutive "memory locations" which would then result in several FETCHCODE operations.
As seen in FIG. 9, a typical cache architecture consists of a cache controller 57 and several RAM memories that store cache data. Thus there is a memory 51 for the last recently used status data (RAM), a memory 53 for valid status memory locations, a memory section 55 designated as tag RAMs and the data RAMs 59 which hold the actual code words involved in many recently used program words.
For better system performance and improvements in reliability, a cache memory module may be divided up into "slices", each of which contains its own cache controller, such as 57 and its own RAM memories such as 51, 53, 55, and 59.
In the aforementioned A11-411 computer system, two cache slices are utilized, one of which handles only odd addresses while the other cache slice handles only even addresses. FIG. 10 is an architectural illustration of how the various resources are divided between the two cache slices and these cache slices are implemented inside two identical Very Large Scale Integrated (VLSI) custom-built gate arrays 60, 61. In this architectural situation, it is important that the chips be identical due to the more expensive cost involved in designing and manufacturing slightly different VLSI chips. Thus, in the cache memory module of FIG. 10, the cache is split into two slices shown as the cache gate array "0" item 60, and the cache gate array "1" item 61, both of which are connected to and under the control of the cache control logic 67. The processor chip 25 communicates through the I bus 25.sub.i to the cache slices "0" and "1". The cache control logic 67 can make a cache request to the slice "0" and/or the slice "1". Each of the two slices can provide a busy signal to the cache control logic 67. Also, each of the cache gate arrays "0" and "1" can inform each other of their presence through interconnecting signal lines.
Since both of the cache slices 60 and 61 execute READCODE and FETCHCODE commands from the processor 25, there are copies of the program word address register (PWAR) existing in both slices, i.e.,there is one PWAR 60.sub.po in the cache slice 60 and one PWAR 60.sub.p1 in the cache slice 61. Communication must exist between the two slices to make sure that each of their copies of the program word address, in each of the registers, will always agree with each other. However, since the two slices 60 and 61 handle mutually exclusive addresses (odd and even), then only one slice can execute a request from the processor at any given time. The "active" slice is determined by the address of the operation being executed.
If the address is "even" then slice "0", item 60, will execute the operation. If the address is "odd" then slice "1", item 61, will be activated. The "non-active" slice will not do work except for making sure that its personal private program word address register stays synchronized and coherent with that of the PWAR in the active slice. Thus, in FIG. 10, the registers 60.sub.po and 60.sub.p1 must be loaded at the same moment of time and must be accurate duplicates of each other, that is to say "coherent".