1. Field of the Invention
The invention is generally directed to digital computers. The invention is more specifically related to the execution of pipelined commands in which operands from first and second data sets are to be fetched from memory and presented sequentially to an execution unit for processing in an order or manner or at a rate specified by at least one of the fetched operands.
2a. Cross Reference to Micro-fiche Appendix
This application includes listings of two interacting computer programs, one for operating an instruction unit (IU) and one for operating an execution unit (EU). The listings are respectively provided in accordance with 37 C.F.R. .sctn.1.96 as microfiche Appendix A comprising .sub.-- 14.sub.-- frames (not counting title and target frames) distributed on a first sheet of microfiche and microfiche Appendix B comprising .sub.-- 26.sub.-- frames (not counting title and target frames) distributed on a second sheet of microfiche and incorporated into the specification by reference.
The assignee of the present application claims certain copyrights in said computer program listings. The assignee has no objection, however, to the reproduction by others of the listings if such reproduction is for the sole purpose of studying them to understand the invention. The assignee reserves all other copyrights in the program listings including the right to reproduce the computer programs in machine-executable form.
2b. Cross Reference to Related Patents
The following U.S. patents are assigned to the assignee of the present application and incorporated herein by reference:
(A) U.S. Pat. No. 3,840,861, DATA PROCESSING SYSTEM HAVING AN INSTRUCTION PIPELINE FOR CONCURRENTLY PROCESSING A PLURALITY OF INSTRUCTIONS, issued to Amdahl et al, Oct. 8, 1974; PA1 (B) U.S. Pat. No. 4,244,019, DATA PROCESSING SYSTEM INCLUDING A PROGRAM-EXECUTING SECONDARY SYSTEM CONTROLLING A PROGRAM-EXECUTING PRIMARY SYSTEM, issued to Anderson et al, Jan. 6, 1981; PA1 (C) U.S. Pat. No. 4,654,790, TRANSLATION OF VIRTUAL AND REAL ADDRESSES TO SYSTEM ADDRESSES, issued to Woffinden, Mar. 31, 1987; PA1 (D) U.S. Pat. No. 4,661,953, ERROR TRACKING APPARATUS IN A DATA PROCESSING SYSTEM, issued to Venkatesh et al, Apr. 28, 1987; PA1 (E) U.S. Pat. No. 4,685,058, TWO-STAGE PIPELINED EXECUTION UNIT AND CONTROL STORES, issued to Lee et al, Aug. 4, 1987; PA1 (F) U.S. Pat. No. 4,752,907, INTEGRATED CIRCUIT SCANNING APPARATUS HAVING SCANNING DATA LINES FOR CONNECTING SELECTED DATA LOCATIONS TO AN I/O TERMINAL, issued to Si, et al. Jun. 21, 1988; PA1 (G) U.S. Pat. No. 4,802,088, METHOD AND APPARATUS FOR PERFORMING A PSEUDO BRANCH IN A MICROWORD CONTROLLED COMPUTER SYSTEM, issued to Rawlinson et al, Jan. 31, 1989; and PA1 (H) U.S. Pat. No. 4,855,947, MICROPROGRAMMABLE PIPELINE INTERLOCKS BASED ON THE VALIDITY OF PIPELINE STATES, issued to Zmyslowski et al, Aug. 8, 1989. PA1 3. Description of the Related Art
Pipelined architectures are employed in high-performance computer systems such as the Amdahl model 5890 or Amdahl model 5995A mainframes to speed the processing of program instructions and related data.
A pipelined computer system may be visualized as a series of interconnected pipe segments where each pipe segment has an inlet for receiving a flow of input data, a midsection for processing inflowing data and an outlet end pumping processed data out to one or more next succeeding pipe segments.
Efficiency and throughput are said to be advantageously optimized in a pipelined system when the midsections of all pipe segments in the system are continuously and simultaneously kept busy processing data, and as a result, they are all producing usable output data for immediate consumption by their respective, next-succeeding pipe segments. When this occurs, data moves through the pipeline as a relatively continuous flow. No pipe segment is left idly waiting by for operations to complete in a preceding pipe segment.
Pipelined computer systems can be constructed with one or more processor units. Each processor unit in such a system may be visualized as being composed of two main pipe segments: an instruction unit (I-unit) and an execution unit (E-unit).
The E-unit contains operand-processing means for processing one or more operands in accordance with a microinstruction when such operands are delivered to the E-unit in conjunction with the microinstruction.
The I-unit contains operand-fetch/delivery means for fetching operands and delivering them to the E-unit in conjunction with corresponding microinstructions.
Operands cannot be fetched and delivered haphazardly. Among the things to be considered during operand fetch and delivery are how to time operand deliveries so that operands arrive at the E-unit in timely conjunction with specific microinstructions and how to order operand fetches so that deliveries will comply with specific orderings when such orderings are required.
By way of example, consider how operands are delivered to an Arithmetic Logic Unit (ALU). The operand-processing means of an E-unit typically includes an (ALU) for performing arithmetic computations.
When two operands are delivered to the ALU in conjunction with a microinstruction which says "add", the ALU processes the operands and produces a result equal to their sum. Operand ordering is not important in this case. If the same operands are instead delivered to the ALU bundled with a microinstruction which says "subtract", not only is the result different but the order in which the operands are delivered becomes important because one operand will serve as the minuend while the other serves as the subtrahend.
In many pipeline architectures, the I-unit is given primary responsibility for not only fetching instructions and their corresponding operands but also for delivering them to the E-unit in the form of immediately-executable bundles of data. Each bundle can be pictured as a package consisting of a microinstruction and/or one or more operands arranged in an appropriate order for immediately carrying out an accompanying microinstruction or a previously delivered microinstruction.
To meet its responsibility, the I-unit typically includes: (a) fetch means for fetching instruction words and operands from memory and storing them in local registers; (b) decode means for decoding each instruction word into a series of microinstructions; (c) packaging means for bundling or aligning each microinstruction with a corresponding one or more operands (if any); and (d) delivery means for sequentially delivering each bundle to the E-unit in an appropriate and timely order. Results from the E-unit are returned to (e) a receiving means of the I-unit and forwarded therefrom to (f) a storage means of the I-unit for storage in main memory or local registers.
An I-unit flow-control program is stored in a control store of the I-unit for controlling the activities of the I-unit and for telling the I-unit when it should fetch a next operand or instruction from memory and for further telling it which operands it should bundle with which microinstructions as it delivers them to the E-unit.
Under optimum conditions, the fetch means of the instruction unit (I-unit) is continuously kept busy sequencing through locations of the processor's local memory (cache memory), fetching instructions (opcodes) and/or corresponding data words (operands) out of the local memory. The decode means of the I-unit is simultaneously kept busy decoding already-fetched instructions and generating a corresponding series of microinstructions. The packaging means of the I-unit is simultaneously kept busy aligning or bundling each prefetched operand with an already-generated, corresponding microinstruction. The delivery means of the I-unit is simultaneously kept busy continuously delivering ready-to-execute bundles of data to the E-unit.
A control store of the E-unit stores an E-unit flow-control program which coordinates the activities of the E-unit with those of the I-unit, telling the E-unit when and how it should respond to each delivered microinstruction and its associated operands.
Under optimum conditions, as the I-unit continuously delivers ready-to-execute bundles to the E-unit, the operand-processing means within the E-unit should be accepting each bundle and immediately processing the operands of that bundle according to their microinstruction. Simultaneously, an outlet portion of the E-unit should be outputting result signals derived from already processed operands to other parts of the machine in accordance with the I-unit flow-control program.
One advantage of such a continuous flow scheme is that data-processing resources within the E-unit do not have to wait idly by while precursory operations such as memory fetches, instruction decoding, or the pairing of microinstructions with associated operands, occur. The I-unit performs these operations ahead of time and presents immediately executable packages of data (each having a microinstruction and/or associated operands) to the E-unit.
It takes a finite amount of time for the I-unit to output address signals to its associated memory and to fetch operands. It takes a finite amount of time for I-unit to decode each instruction. And it takes a finite amount of time for the I-unit to align (bundle) each fetched operand with an associated microinstruction. This puts constraints on the ability of the I-unit to respond to immediate needs of the E-unit.
If continuous throughput is to be maintained in the pipeline, the I-unit flow-control program has to predict ahead of time what bundle of operands and microinstruction the E-unit will next need for every machine cycle. As long as the I-unit flow-control program continues to correctly anticipate the needs of the E-unit, the I-unit can begin its job of fetching appropriate operands ahead of time and the I-unit can continue to deliver immediately-executable packages of data to the E-unit according to a just-in-time delivery schedule. Correspondingly, the E-unit can continue to receive and process the delivered signals at a peak operating speed and the overall throughput of the pipelined system advantageously remains at a peak level.
If, on the other hand, the I-unit flow-control program is unable for some reason to deliver an immediately-executable bundle of signals to the E-unit just as the E-unit finishes processing a previously delivered bundle, the utilization of resources within the E-unit becomes less efficient. Pipeline throughput and/or efficiency becomes disadvantageously degraded.
A number of conditions can place the I-unit in a state where it is unable to deliver an immediately-executable bundle of signals to the E-unit just as the E-unit becomes ready to accept the bundle.
As an example, consider what happens when the I-unit flow-control program intentionally delays the I-unit from incrementing an operand pointer to point a next needed operand and from then fetching the new operand and pairing that new operand with a microinstruction next-to-be delivered to the E-unit. The E-unit is disadvantageously caught waiting for the I-unit as the I-unit later increments its pointer, fetches the operand and delivers it to the E-unit.
This typically occurs when the I-unit needs to wait until the E-unit finishes executing a first bundle of data before the I-unit can determine from the execution results of that first bundle, whether the I-unit should increment its operand pointer. A double-wide gap develops in the continuity of the usable data flowing through the pipeline. Not only does the E-unit have to stand by idly while the I-unit completes its operand-fetch operations, but the I-unit has to also idle uselessly while the E-unit completes execution of the precursory data bundle.
With regard to the above example, there is a variation of particular interest here. It is one where the I-unit begins to fetch a first operand and immediately thereafter the I-unit has to decide, based on the execution results of that first operand, what second operand it should next begin to fetch for next processing in the E-unit. The I-unit has to wait for the first operand to make its way through the entire length of the I-unit to the E-unit and for the results from the E-unit to return before the I-unit can determine what second operand is to be fetched and bundled with the next microinstruction. Then, the E-unit has to wait in turn while the second operand moves through the entire length of the I-unit. Such a waste of time is undesirable.
To appreciate the above problem more specifically, consider the sequence of operands shown by the below DIAGRAM-1. The operands are to be delivered to the E-unit in the illustrated left-to-right order. ##STR1##
Each bundle is represented by the bracketed symbol, "[.sub.-- ]". Every bundle contains two operands (represented by letters inside the [.sub.-- ] symbol) which are to be delivered to the E-unit in conjunction with a corresponding microinstruction (not shown). The symbol "P" represents a "pattern" operand. "S" represents a "source" operand. "B" represents a "blank" or "don't care" piece of data. For microinstructions having a "B" in a particular position of their respective bundles, it is not important what data is being held in that "B" position.
Instructions like the EDIT and EDMK (Edit/Mark) instructions of the IBM 390/ESA mainframe computer are examples of operations which can generate a series of bundles having such "pattern" and "source" operands. (A detailed description of the EDIT and EDMK instructions may be found in IBM ESA/370 Principles of Operation, Rev. SA22-7201-0, pages 8-6 to 8-10.)
The leftmost bundle in DIAGRAM-1 (referenced as the bundle at position number 1) is the first in time to be delivered to the E-unit. It is to be noted that all bundles shown in DIAGRAM-1 contain the operands P and B, except for one bundle at position number "n". That bundle contains the P and S operands. The three dot symbol ". . . " preceding bundle "n-1" indicates a repetitious series of [PB] bundles.
The decision to place the [PS] bundle at position "n" is not made until after the [PB] bundle at position "n-1" is processed by the E-unit. Until the bundle of position "n-1" is processed, neither the I-unit nor the E-unit know that the [PS] bundle is to be next delivered to the E-unit. It is also possible that a [PB] bundle might be required at position "n".
(Artisans familiar with the IBM EDIT and EDMK instructions will recognize this explanation to be rather simplistic. Actually, for the case of the EDIT and EDMK instructions, each "P" and "S" is considered as an individual bundle and it is not until after the E-unit has received and processed the "P" at position "n" of the above DIAGRAM 1 that the E-unit can decide it next needs to receive an "S" operand.)
Under a conventional approach, the I-unit flow-control program instructs the I-unit to wait for the E-unit to finish processing each bundle before it fetches a next operand from memory, and before it completes assembly of the next bundle and before it delivers that next bundle to the E-unit.
From a timing perspective, the delivery stream heading toward the E-unit appears as shown in the following DIAGRAM-2. ##STR2##
Each delay, "(dly)", in the above operand delivery stream is, at a minimum, equal to the length of time it takes for the I-unit to fetch the "S" operand of the [PS] bundle and to deliver the same to the E-unit.
(For the case of the IBM EDIT and EDMK instructions, it is more accurate to illustrate the delay as being interposed between each "P" and its following operand, as in the symbol: [P (dly) B], but for the sake of initial simplicity, we show delays inserted between [PB] bundles in DIAGRAM-2.)
The total delay penalty for processing the bundles of positions 1, 2, 3, . . . , n-1, n, . . . will be a corresponding multiple of the single delay time, "(dly)". When the number of bundles to be processed in the sequence grows, the total time penalty also grows.
A better approach is needed.