1. Field of the Invention
This invention relates to a method for operating a computing system to optimize the utilization of machine resources. More specifically, it relates to the compilation of a high level language instruction stream to optimize in the object code instruction stream the assignment of unlike subsets of registers across basic blocks of straight line code and among non-uniform data register requests.
2. Description of the Prior Art
As used in this specification, the term "computing system" includes a central processing unit (CPU) with main storage, the CPU including a plurality of registers, and input/output (I/O) and storage devices coupled thereto, such as is described in G. M. Amdahl, et al, U.S. Pat. No. 3,400,371, issued Sept. 3, 1968 and entitled, "Data Processing System".
An application program, or high level language program, is a program written in a form with which a user of a computing system is familiar, rather than in machine language, and includes a coded instruction stream which is machine convertible into a plurality of serially executable (straight line code) source statements, selected source statements including one or more operands in the form of symbolic addresses, and other selected source statements requiring conditional or unconditional branches to still another source statement identified by a label (basic blocks of code being straight line code bounded by branches and identified, in some embodiments, by labels).
A compiler is a program which operates a computing system, taking as its input the machine readable instruction stream of a program written in a hih level language, to interpret the source statements and produce object code. The object code is suitable for link editing into a load module which is directly executable by the computing system, and generally includes more than one object (machine language) instruction for each source statement. One function of a compiler is to allocate or assign quantities referenced in the source statement operands to specific machine registers (see U.S. Pat. No. 3,400,371, column 38, lines 145-152). In so doing, because there are usually many more unique operand quantities than machine registers, it is necessary to include in the object code stream register load instructions, such as the LR, L, LH, LER, LE, LDR, LD instructions described at columns 66 and 76 of Amdahl U.S. Pat. No. 3,400,371, and backstore instructions, such as the ST and STE, STD store instructions described at columns 70, 71, 79 of Amdahl U.S. Pat. No. 3,400,371. A significant characteristic of a register set such as that described in Amdahl U.S. Pat. No. 3,400,371 (column 38) is that it comprises many unlike subsets; thus, it is partitioned into a plurality of disjoint sets (e.g., fixed vs. floating point), overlapping (e.g., general purpose registers vs. general purpose pairs) and/or general purpose registers except a given register for addressing. Furthermore, specified registers may be temporarily restricted from assignment, either because they are pre-empted by the hardware/microcode implementation (e.g., registers 1,2 used in the translate and test instruction) or, because they are pre-empted by software architecture (e.g., register convention on linkage registers to subrountines). Finally, one or more general purpose registers may be preempted by a compiler as global registers, the contents of which must be known and managed by the compiler register assignment facility but may not be displaced (e.g., registers reserved for address path calculations). The need for optimization of register assignment occurs when, for a machine with a fixed number of registers (N), all N registers contain quantities which may be used further on in the computation and a register is required to contain yet another unique quantity. There is a cost associated with a subsequent register load of the quantity which is displaced if that quantity is again referenced. Also, a cost is associated with saving in storage a quantity which is not "read only". Optimization requires that the registers be chosen for quantities in a way which will minimize the cost of instructions to displace and restore quantities in registers.
The solution to optimal register assignment is to provide an oracle that can look ahead when a register is required to find out which of the quantities currently is registers may be displaced with the lowest cost. This can be done in a number of ways.
If the text (or instruction stream) is all available for perusal then the oracle is easily implemented. Many optimization approaches in the theoretical prior art assume this condition. However, such is not the case when performing the register assignment activity for almost all "real-life" application programs because the main storage available to the compiler is insufficient in size to store the entire application program. Alternatively, at the point in the instruction stream where the oracle must be invoked, the instruction stream is read and saved in storage until the determination of the best register to use is made. However, because instruction streams can be arbitrarily long and the storage capacity of computing systems is finite, this strategy is bounded by the space available for the stream. A solution to the above bounding problem is to first read the text backwards, recording at each reference to a quantity the point at which a previous reference to the quantity was made. The result of that operation is readily translatable to a NEXT function and in the subsequent forward pass the quantities in registers each have at all times a NEXT attribute which allows the oracle to determine which is used the furthest away or not at all. However, to implement this approach, instruction streams must be traversable in a backward direction--that is, the instruction coding must contain overhead for that purpose, or be in fixed length format, or, if blocks of the encoding all read in reverse, the blocks must first be traversed forward to make the backward sequence for traversal. Further, for many secondary storage devices, reading backwards is difficult, and writing the modified instruction stream augmented with the additional NEXT information is even more complex. More important, however, this "distance to next use" stategy does not provide an optimum solution for mixed cost register assignment--where all registers are not available for all register requests at a uniform cost (in time or machine instructions).
One prior art approach to the optimization of register assignements which does not require a backward scan of the instruction stream nor implement a "distance to next use" strategy is suggested by F.R.A. Hopgood, Compiling Techniques, American Elsevier, New York, 1969, pp. 96-103--and is named for and based in part on a block replacement algorithm for a virtual storage computer studied by L. A. The Belady, IBM Systems Journal, Vol. 5, No. 2, 1966, pp. 86-89.
The Belady algorithm introduces the concepts of "load point", "decision delay", and "complete set" (which concepts are important to an understanding of the method of the present invention) but does not provide for the assignment of unlike subsets of registers.
The prior art (Belady) procedures begins with the registers free. As long as a register is free it may be assigned to the next unique quantity referenced in the instruction stream. When a reference in the instruction stream is to a quantity already in a register an instruction to load that register is not necessary. Once the registers are all in use and another unique quantity is referenced in the instruction stream, a decision delay starts because it is not yet apparent which quantity already assigned to a register should be displaced. The beginning of such a decision delay occurs at a load point. The load point for a quantity is that point in the instruction stream at which the quantity (referred to as a load point quantity) must be inserted into a register--and a load point quantity will enter its register at its load point. However, during a decision delay, which register the load point quantity will enter is not yet determined. Consequently, the load point quantity is remembered, and the scan of the instruction stream continues for the purpose of defining the complete set of N-1 (where N is the number of registers) quantities assigned to registers which should not be displaced by the load point quantity. For this purpose, as the serial scan of the instruction stream continues, if a quantity already assigned to a register is referenced (herein, a reuse quantity), the reuse quantity is temporarily disqualified as a candidate for displacement at the load point, as it (the reuse quantity) should be kept in its register to avoid another load instruction in the compiled instruction stream. When N-1 reuse quantities have been located in the instruction stream, only one register remains as a candidate to receive the load point quantity. Together with the load point quantity, the N-1 reuse quantities (or disqualified quantities) form the complete set of quantities that define the register state at the load point of the load point quantity. Consequently, the appropriate register load and backstore instructions may be inserted in the compiled instruction stream (object code stream). The decision delay has ended.
The manner in which the Belady algorithm handles the situation where the delayed displacement of a quantity in a register to make room for a load point quantity cannot be made before a second load point quantity is referenced in the instruction stream (necessitating a second delayed displacement before N-1 reuse quantities have been identified) is not clear from the literature. Nor, as previously mentioned, does the Belady algorithm provide for the "real world" requirement of assigning unlike quantities to different classes, or unlike subsets, of registers, such as anomalous registers, registers reserved for specific purposes, such as, for example when register .0. cannot be used to address storage, permanent registers (e.g., registers reserved for use as anchors for addressing path instructions), and global temporary registers (e.g., register assignments required by a prior global optimization pass through the instruction stream). Other classes of registers include coupled registers (e.g., register pairs, or registers consumed two at a time), implicit registers (e.g., Translate and Test result register 1), registers specificed as absolute registers for sections of the instruction stream, and multiple concurrently available registers required for multiple operand instructions.
Further, the register assignment method of a compiling operation, in order to optimize the assignment or registers and, consequently the operation of a computing system under control of the compiled, or executable, instruction stream, should extend the optimization steps across basic blocks of straight line code (such as are defined by branch instructions) and also adjust for the difference in the cost of moving a quantity from one register to another or of moving between a register and storage.
Quantities to be referenced in main storage are accessed in System/370 architecture by specifying a base register and displacement. The base must be in a register to perform the load and store operations previously mentioned for inclusion in the compiled instruction stream to manage the register quantities. Consequently, a register assignment method implemented by a compiler is needed which provides for addressability to quantities in main storage by managing the register contents for the address base quantities in such a manner as to avoid unnecessary load instructions in the compiled code stream.