In its generalized form, a computer system consists of one or several processors, main memory, input and output modules and often maintenance modules. The processors execute instructions which are stored in memory and also function to process and manipulate data. The main memory contains data and programs which are accessed by the processors and the input/output modules. The input/output modules permit communication between the peripheral devices (disks, tape drives, printers, modems, etc.). The maintenance module is used to initialize the machine system, to run diagnostics, to determine failures, monitor state of the system, store system information, note errors and take corrective actions. All of these cooperating modules cooperate over a system bus such as is indicated in FIG. 2.
A typical example of such a computer system is indicated in FIG. 2 where a processor card 600 includes a processor 100 connected by an internal bus 110 to a cache unit 200. A system bus 120 connects the processor 100, and the cache 200 to other system modules such as the main memory 300, the I/O subsystem 400 which communicates with peripheral devices 450. Additionally the maintenance module 500 connects to each one of the other modules in order to perform initialization and diagnostic operations.
Each of these modules, being connected to the system bus 120 still operate independently. However, no one module should gain control over the system bus 120 for indefinite on long periods of time since this would limit the access and work of the other modules. Likewise, if any one of the modules on the system bus could not keep up with the data traffic on the system bus, this module would tie up the bus and slow the work of the other modules. One often recurring problem involves the sequences in which a processor module may tie up the system bus 120 and thus impede the system bus traffic with the result of degrading the system performance. It would be most desirable to provide a processor operational sequence which does not impede the system bus traffic, even under the worst conditions, and thus enable a much higher overall system performance.
As the performance capability of any of the modules increases, likewise so does the data traffic on the system bus 120. First considering the I/O module 400, for example, the faster this module handles transactions to the peripheral devices 450, (disks, tape drives, monitoring terminals, etc.) the more Reads and Writes this module will generate to the main memory 300 over the system bus 120.
Likewise, the processor 100 will have a similar impact on the system bus 120. The faster and more capable the processor 100 operates, the more traffic it will generate on the system bus 120 with Read and Write command operations.
Thus, the system bus 120 is most critical to the overall performance of the computer system. And consequently, the bus is designed to operate at the maximum frequency that is technically possible. In any developing new system, it is desired that the system bus be able to operate at the physical limits of the technology available at that period of time.
The process of designing new computer systems has generally become much more complicated and expensive and it is constantly desired that these costs be kept to a minimum. One approach to generate lower development cost is to design the best system bus possible and then use it for several computer systems. The only thing that is changed for each new system is one or two of the cooperating modules at any given time so that an entire new system does not have to be designed for each phase of the computer development cycle. As an example, the Unisys A11-211 computer system manufactured by the Unisys Corporation of Blue Bell, Pa., is a system which has been developed with the characteristics of a high volume system bus which is applicable over a number of computer system developments. For example, this computer has a processor which operates at 12 megahertz and interfaces asynchronously to the system bus which then operates at 16 megahertz.
Then subsequently, with the development of a new processor with a new architecture, the new processor would operate at a higher frequency such as 16 megahertz which would be compatible with the originally developed system bus which operated at that frequency. However, with the new architecture and higher frequency operation, the processor is now be generating much greater traffic than the previous processor. It might be indicated that the new system processor would generate approximately twice the traffic because of its enhanced architecture and design improvements over the earlier computer system. Then it may be indicated that future processors may soon be operating over a 32 megahertz frequency thus quadrupling the traffic on the system bus 120.
An increase in data traffic on the system bus is good for system performance since more work gets done. However, in this case, all modules attached to the system bus then have to be able to handle the maximum I/O traffic generated by any one module, as well as the cumulative traffic of all the modules working independently. Otherwise, the computer system is slowed down to the slowest data processing rate of the slowest module in the system. In this case, the system could not operate at its proper capacity. As an example, it might be considered similar to a well maintained super highway, with several automobile lanes in each direction plus overpasses and ramps. Then under normal traffic conditions, cars could move at speeds of over 60 miles per hour in a safe manner. However, if the quantity of cars increased to a certain threshold level, with cars getting on and off the freeway randomly, the highway would become inefficient and for example, the highway speeds could drop to 30 miles per hour or less which is today something seen very commonly in large cities during the rush hours.
Thus the situation arises in that, as any one of the modules has a performance capability increase, then the other modules need to be capable of handling the extra traffic which is generated. One general approach used is to "over design" the data handling capabilities of each module at the system bus interface. For example, in the previously mentioned Unisys A11-211 computer, the I/O module was "over-designed" to handle the worst case scenario of data traffic. Thus, where previous systems could generate data traffic of eight megabytes per second (eight million bytes per second) the I/O module of the A11-211 system was designed to handle up to 48 megabytes per second or 6 times as much as normal. Thus it was over designed with the idea of future growth for future systems so that the developed I/O module would not have to be designed anew each time a new system was introduced. Thus with the design of a higher capability system designated as the A11-411, the previously designed I/O module would be compatible with a higher capacity system.
Even though the described A11-211 or the upgraded A11-411 systems do not generate a 48 megabyte per second transfer rate on the system bus, it is standard that the system bus be tested at the highest operable frequency. Thus it is tested at "bursts" of 48 megabytes or higher. Thus when the testing of the A11-211 system occurred with a high data traffic rate, it was found that the processor of that system could not keep up with the high input/output I/O data transfer rates known as "bursts". A burst of I/O occurs when the I/O module has sufficient data to Read or Write consecutively to the main memory 300 for long periods of time. The I/O Module 400 does consecutive Reads or Writes as fast as the system bus can handle them.
Typically, in the A11-211 system, the I/O module could do typically 250 back-to-back Reads or Writes, with each Write occurring every 14 clocks. In a maximum type configuration, there are two I/O modules 400 (FIG. 2). Therefore, on the system bus 120, the cumulative traffic would be twice the amount of back-to-back Reads and Writes, that is to say, two Writes every 14 clocks which is equivalent to one Write every 7 clocks.
The earlier A11-211 processor could not "spy" on the system bus and still sustain the 500 consecutive Writes. This early processor would issue "RETRYS" causing the I/O module 400 to stop and then repeat the Write operation later. Then at a later time, the processor, which may have caught up, would then be ready to spy on the system bus again. Once "RETRYS" start occurring, the system bus traffic stops significantly, similar to the rush hour on a highway where incoming cars have to wait at the on-ramps. The situation of using the "RETRY" operation in a computer system has been described in a co-pending application U.S. Ser. No. 961,744 and entitled "Programmable Timing Logic System For Dual Bus Interface" which has been allowed.
The "traffic problem" caused by the earlier processor of the A11-211 system occurred because the processor 100 and the system bus 120 were sharing a common resource which was the cache memory 200 seen in FIG. 2. Here it is seen that the processor 100 interfaces over an internal bus 110 to the cache memory 200. Also, it is seen that the system bus 120 interfaces to the cache memory via bus 120.sub.s. The cache 200 is a fast memory and contains a subset of the locations in main memory 300. Each time the processor 100 issues a Read or a Write, the cache memory 200 checks to find out if it contains the data internally. If the cache does contain the requested memory location, it is a cache "hit" and the data requested is then returned, on the next clock, to the processor 100. If the cache memory 200 does not have the data, this is a cache "miss". In this case of a "miss," the processor 100 has to then access the system bus 120 and get the data from the main memory 300, however, this extra step is much slower taking 8 or 9 more clock time periods.
As indicated in the aforementioned co-pending application U.S. Ser. No. 018,996 entitled "Dual Bus System Providing Compatibility For Store-Through And Non-Store-Through Cache Memories", now abandoned, there are two types of cache memories, "store-through" (ST) and the "non-store-through" (NST). The store-through cache memories operate in a mode such that whenever the processor 100 issues a Write command, the Write data is sent to the cache 200 as well as the main memory 300. However, in non-store-through (NST) cache memories, the Write commands and data are sent to the local cache 200 alone and not to the main memory 300. Thus, the data in the cache memory 200 might possibly be different from the data residing in main memory at any given time, leading to a period of non-coherency.
The non-store-through cache memories are more complicated to control in system operations than are the store-through cache memories. The cache memories in both the earlier A11-211 and the upgraded A11-411 are "ST" (store-through) cache memories.
The purpose of the cache memory, such as cache 200, is to provide data to the processor on a very quick basis. Additionally, another main task of the cache is to maintain "data coherency", that is to say, that the data in the cache 200 has to accurately match the data residing in the main memory 300. As an example, when the I/O module 400 does a Write of data on the system bus 120, that means that now the main memory has received the "latest" data. In this situation, the cache memories 200 in the processor card 600 will then need to invalidate the address location in the cache memory that was written to the main memory by the I/O module. The cache memory 200 maintains the main memory address within itself of the word which corresponds to the same main memory address of that word.
Due to the fact, that at certain times, the cache memory holds inaccurate data from that residing in the main memory 300, then the cache provides a "invalidation queue" to hold the invalidation addresses which were (of the word addresses being updated) derived from spying on the system bus 120. The cache memory 200 itself does the invalidation operation in between its service to the processor 100. However, during high data traffic on the system bus 120, the invalidation queue (260 FIG. 1) can get full. Thus, the cache 200 will be seen to have two requestors which are the processor 100 and also the system bus 120 in order to perform the invalidation operation. These two requestors can also be seen in FIG. 2 where the processor uses the internal bus 110 to access the cache while the system bus uses its separate spur line 120.sub.s to access the cache 200.
However, while the processor 100 is doing a Read or a Write operation, the cache memory 200 cannot do any invalidation operations. And similarly, while the cache memory is doing its invalidation operations, the processor 100 cannot access the cache memory 200. Thus, if the I/O traffic were high enough to keep the invalidation queue (260 FIG. 1) constantly filled, the processor 100 would never be able to access the cache memory 200 and no effective work could be accomplished. On the other hand, if the processor 100 were constantly accessing the cache 200, then the cache could never fulfill its invalidation operations, and the invalidation queue 260 would fill up and the system bus traffic on 120 would stop.
The earlier A11-211 computer system had a cache memory but under conditions of high data traffic on the system bus, the cache could not operate efficiently, that is to say, the cache memory could not keep up with the invalidation operations required, and thus Write operations had to be retried or repeated on the system bus 120. In trying to fulfill its invalidation sequences, the cache memory blocked access to the processor 100 for long periods of time since the cache could not keep up with the high I/O traffic. This also affected the system bus traffic because, while the invalidation queue was full, the I/O traffic stopped causing a bottleneck to the processor and the system bus.
Thus, with the aspect of an upgraded processor (A11-411) being developed to introduce even more traffic on to the system bus 120, it was seen that the earlier design of the A11-211 processor, (since it could not keep up with the system bus traffic), would lead to even further problems when the upgraded processor was implemented. This was further complicated in that the maximum computer system configuration involved two processors rather than just one.
Thus, it was necessary to provide a design which would overcome these problems and allow the processor or processors to access the cache memory as often as necessary without affecting a slow down on the system bus. Further, from the system bus viewpoint, the cache memory is required to "spy" on the maximum possible I/O traffic without hindering the processor's access to the cache memory. Thus, it was necessary to find some system and operation which would provide the best operation for both of these situations whereby (i) there would be immediate processor access to the cache as well as (ii) immediate access by the system bus to the invalidation queue in the cache memory.
The presently described system provides the required solutions involved to the above described problems by providing processor access to the cache memory and at the same time allowing equitable access by the system bus to the invalidation queue in the cache memory.