In the data processing art including present day microprocessor technology, it is a known expedient to use pipelining on the primary I/O bus or channel between the CPU and external units such as main storage and the various I/O devices, e.g., disk, display or printer. Such pipelining involves overlapped transactions on the I/O bus, i.e., a plurality of data transfers to and from various I/O devices or units or main storage may be overlapped on the primary I/O bus. In other words, the I/O bus needn't be locked into a single transaction; a first transaction may be initiated and before it is completed a second and a third transfer transaction involving the I/O bus may be initiated. Some typical patents describing such pipelining are Calta et al, U.S. Pat. No. 3,447,135, Peripheral Data Exchange; Dennis, U.S. Pat. No. 4,130,885, Packet Memory System for Processing Many Independent Memory Transactions Concurrently; Levy et al, U.S. Pat. No. 4,232,366, Bus for a Data Processing System with Overlapped Sequences; Dennis, U.S. Pat. No. 4,128,882, Packet Memory System with Hierarchical Structure; and Cassarino, Jr. et al, U.S. Pat. No. 3,997,896, Data Processing System Providing Split Bus Cycle Operation.
While the art has recognized the need to overlap such data transfers and external units over I/O buses to speed up data processing operations, there appears to have been little consideration given to the overlapping of such external transfers with storage transfers that take place within the CPU itself. The CPU in carrying out its operational and computing functions must conduct extensive register to register transfers within the local storage means in the CPU. In present day microprocessor technology, such local storage means may customarily comprise a plurality of RAM registers in which the data fetched from the external main storage is temporarily stored while it is being manipulated in the CPU. Such data manipulation normally requires a great number of register to register transfers within CPU. Such register to register transfers are relatively short in duration, normally requiring an effective throughput one CPU time cycle to complete. In contrast, transfers over the I/O bus to main storage or other I/O devices are much longer, normally requiring three or more CPU time cycles to complete. In a great many conventional data processing systems, it has been customary to employ a memory cache expedient in the CPU so that a substantial number of data transfer transactions from the I/O bus to main storage or other I/O devices may be carried out during time periods previous to their use in the CPU and stored or buffered in the storage cache associated with the CPU. In systems utilizing such a cache, the relatively long times required to transfer data from storage or other I/O devices may not present a problem in that a great many of the instructions or other data required from main storage to carry out CPU operations or computations have been prestored in the CPU cache and are immediately available.
However, with the development of microprocessors, there has been a trend to eliminate or greatly curtail the size of CPU caches because of technology space limitations resulting from the size of the semiconductor substrate in which the various microprocessor circuits are formed. Accordingly, technology presents a problem of how to eliminate the need for cache space and yet maintain the high operational speeds required of microprocessors.