Out-of-order or out-of-sequence instruction processors is the area where my inventions are most useful. My early work has been published, as noted above [DwTo 87]. It related to a Fast Instruction Dispatch Unit for Multiple and Out-of-Sequence Issuances. Portions of this work is relevant to my preferred embodiments of my inventions which includes features and aspects amounting to inventions which make not only my prior work better but are also applicable to different classes of computer systems in general.
A recurring problem in computer system is throughput. Machine cycle time is wasted when a machine part sits idle. I have felt that the cycle time could be decreased by handling better multiple, possibly out-of-order instructions. The machines described herein may be employed in RISC type processors. In addition many elements of my inventions can be employed in superscalar machines as illustrated by the more advanced IBM systems.
Generally two operations take a lot of machine time in processing. These operations are branching and the moving of data between storage (memory) and the instruction processing unit (cpu). The second generation of RISC processors (which typically separate data processing and instruction processing) examined and developed solutions to improve performance by emphasizing a machine organization and architecture which allowed parallel execution and pipelining of instructions. As illustrated by the manual "IBM RISC System/6000 Technology", published by International Business Machines Corporation, 1990 (SA23-2619-00) the RISC RS/6000 architecture resulted in an implementation that could execute more than one instruction per cycle. It accomplished this by separating the processor into functional units, allowing each functional unit to act as a machine to process instructions in parallel. The three functional units were the branch unit, the fixed point (integer) unit, and the floating point unit. The organization of these units (see FIG. 1 called Logical View of RS/6000 Architecture, page 17 of the referenced manual) was something like placing these processors each at the corners of a processing triangle. At one apex was the branch processor through which instructions passed through the connections to the other functional units at the other apex of the triangle. The branch processor functional units obtained its instructions from an instruction cache located between the branch processor functional unit and the main memory of the system. The other two apexes of the system shared a cache, but the shared cache here is a data cache which is located between these processor functional units (fixed and floating point) and the main memory. This machine organization, like certain earlier machines of a more mature architecture, such as certain System/370s, increased throughput by allowing multiple operations to be performed at the same time.
Like the RS/6000, my preferred computer system would be provided with an instruction issue unit. This unit would do scheduling of processing similar to the function performed by the branch processor of the RS/6000. There would be multiple execution units. The RS/6000 provides multiple execution units, a fixed point function unit and a floating point functional unit. Each computer system has a register file and a main memory. A cache is provided between main memory and the functional units. In a RISC architecture the cache may be provided as separate units for cache processing, one being a data cache, and another being an instruction cache.
Machines like the RS/6000 provide an interconnection unit or interconnect network between the functional units, the register file(s) and the instruction unit. Machines like the RS/6000 use multiple functional units in parallel, but there is only one functional unit for each functional process (fixed point, floating point). Each of the functional units processes instructions sequentially in order. The branch processor of the RS/6000 issues instructions for processing to the other functional units in the order they are to be processed.
Some time ago I developed a machine organization that provided elements not common to systems like the RS/6000. I prefer that there be several copies of the same function provided as functional units. In order to improve throughput, I prefer that there be provided some means whereby the issue unit can detect register dependencies. The machine which I describe has been provided with means for scheduling register to register instructions for multiple out of order execution.
Generally such a machine would be similar to that provided by the report I made on early work several years ago regarding the Dispatch Stack. See References. The suggestions included in my report would improve throughput, but they do not satisfy all of the needs or incorporate the further new elements, units, features and improvements which I will describe herein.