The pipelined processor organization is recognized as providing substantial advantages over conventional systems. Likewise, the use of high speed cache storage systems to obtain a better match between processor speed and memory cycle times has been effective to improve the performance of computing systems. Later development led to the incorporation of cache storage in pipelined processor systems, providing further improved performance. A conventional cache storage system often takes more cycles to execute certain types of IBM System/370 370-XA instructions simply because the data must be moved from storage to the execution unit and back to storage. Such operations may encounter interference by use of common data and/or data paths between successive operations with the cache. For example, with convention systems, in the case of a storage-immediate instruction, the data must be fetched from cache, requiring that the operation to be performed must wait for the fetch operation. The operation is performed and the results are stored to cache. If a common bidirectional data path is used to fetch and store data within cache, further delays may result. A cycle diagram for such an operation with conventional execution in the execution unit and with execution within cache according to the invention are shown below.
______________________________________ Conventional Execution fetch access fetch execute store access access re- cache & transfer operation transfer cache cache quest direc- to E- to direc- for tory unit cache tory store for for fetch store Execution in Cache store access execute immediate cache & operation request directory for fetch access and store cache for store store transfer to cache ______________________________________
Such operations as those listed under the conventional execution approach for storage-immediate instructions are required for storage-storage instructions as well. It can be seen how the execution of such operations, even when pipelined, can be stretched into large numbers of cycles by virtue of the fact that data must move between different hardware units to perform the different operations. The fact that a fetch requires use of the cache data array and address array (directory) in the same cycle, while a store requires access to the cache address array in the cycle prior to the storing of results in the cache data array, leads to contention for the cache resources.
U.S. Pat. No. 4,437,149 shows a data processing system utilizing separate cache memories for data and instructions. The system includes an instruction decode means located astride the bus from main memory to the instruction cache to at least partially decode the instruction and store the result of the decode in the instruction cache. The system does not function to execute instructions on data located in the cache.
In the pipelined processor shown in U.S. Pat. No. 4,471,432, the instructions are examined prior to execution and dispatched to the appropriate processor for execution. There is no showing of cache in which the instructions are executed.
U.S. Pat. No. 4,516,203 describes a cache system which uses special codes to designate encacheable data items. There is no showing of cache in which the instructions are executed.
The system described in U.S. Pat. No. 4,734,852 incorporates a register file in which processor and storage. accesses are overlapped to allow data references to storage in parallel with instruction execution. Loads and stores against the register file are partially executed or suspended if the program requires use of the data not yet loaded. The system does not show the execution of instruction in a cache.
U.S. Pat. No. 4,747,044 describes a system in which high level instructions are executed directly by microcode which is contained in main storage. There is no showing of a cache system within which instructions are executed.
The pipeline computing system of U.S. Pat. No. 4,750,112 relates to a pipeline processor which has an instruction pipeline and an execution pipeline which are used with a microcode controlled instruction execution unit. There is no showing of a cache system which allows execution of instructions within.
In U.S. Pat. No. 4,775,936, a two cycle memory is shown which permits write requests to be enqueued in a fashion which allows instruction execution to proceed unimpeded. The processor is permitted to fetch from the write queue. There is no showing of instruction execution in the cache storage unit.
The article "Effective Execution of Multicycle Instruction in One Processor Cycle" by J. H. Pomerene, T. R. Puzak, R. N. Rechtschaffen, and F. J. Sparacio, in the IBM Technical Disclosure Bulletin, Vol. 26, No. 9, February 1984, at pages 4667-4668, describes a cache system which reduces a store multiple instruction to one processor cycle which is used to validate a copy of the general purpose registers in the cache. The system does not show the execution of an instruction within the cache unit.
None of the systems known in the prior art avoids the need to pass data contained in the cache to the execution unit for the execution of logical operations essential to the IBM System/370 storage-immediate and storage-storage type instructions. The time required for the transfer of data to the execution unit extends the execution time for completion of the instruction. An additional problem frequently arises due to the interference encountered within the pipeline stages of the system. These problems have prevented full utilization of the cache storage system within a pipeline processor organization.