The architectural definition of todays general purpose computers have implicit in its definitions the sequential nature of the instructions execution. In those areas where parallel operations are architected--for example, the operation of the channel in parallel with the CPU's program execution--special provisions are provided in the architecture to allow the programmer to synchronize his CPU program with the asynchronous I/O operation. The programmer correctly assumes that all the sequentially architected instructions are started and finished sequentially.
The designers of high performance sequential machines recognized many years ago that parallel execution of instructions would be useful to improve a machine's performance. Their first approach in this direction was to provide separate hardware facilities for the different stages in the execution of an instruction. Thus, if the execution of an instruction took "N" cycles; and if each cycle of the execution utilized a different hardware facility, then the design could theoretically be executed "N" instructions each in a different stage of completion. This concept is commonly called pipelining.
The designers of pipelined machines would use few or many hardware facilities to provide different degrees of pipelining, and hence performance. It should be noted that logical interlocks, (for example, the N'th instruction's input information would come from the N'th-l instruction's output), and queuing on hardware facilities, has caused pipelined machines to perform at far less performance than their theoretical limit of one instruction per cycle. For a pipelined 370 machine to obtain an execution rate (assuming all data and instructions are resident in the cache) of one instruction for every three machine cycles one normal commercial code is considered good. The 3081 and 3090 are examples of IBM system 370 pipelined high performance systems. These machines always finish instructions in sequential order.
To further increase performance designers can take the approach of duplicating hardware facilities where queuing becomes a problem--usually on the execute units, caches, and data busses--, and allowing out of sequence execution completion. These extensions of pipelining, alleviate the queuing problem on hardware facilities, and reduce the effects of the logical interlocks. (It allows, for example, greater than "N" instructions to be initiated before the first is completed.)
The price that is payed for the increased performance is additional hardware, and increased control complexity to maintain the logical sequentiality the architecture demands. The IBM 91/95/195 processors utilized multiple execution units and out of sequence execution completion for the floating point operations of systems 360 architecture. The CDC 6600 and 7600 systems also utilized both of the above extensions of pipelining.
The present invention provides a unique method that allows, through a tagging and counting scheme, each general purpose register (GPR) of the system 370 architecture to be assigned a string of operations that must be executed sequentially upon said GPR. Further, these assignments can be intermixed with similar strings of operations for other GPR's. This technique will provide a framework to allow multiple independent streams of operations to each proceed sequentially, but concurrent to each other. It provides the proper logical interlocks, both within each stream, and between concurrent streams, which preserves the logical integrity of the utilization of the GPR's.