Most programming languages operate with different types of data, each of which may use different levels of precision. Two examples of common data types are integer and floating point data. Operations involving floating point data conventionally use a higher precision than integer operations. The actual precision value often depends on the characteristics of the processor. In some processors, regular, single precision might be 32 bits, while double precision would be 64 bits. Other example precisions include 16 bit single and 32 bit double, 8 bit single and 16 bit double, and the like. In such a computer, floating point operations would be assigned higher precision data compared to integer operations
Computer code is often written in a high level programming language that is conducive for humans to design and write. However, in order to execute this high level programming code, a computer will convert or compile the high level code into a low level code that can be directly executed by the processor. This low level code can be machine language, assembly language, or the like. When converting and processing the high level code into the low level code, a performance metric that is often monitored is total runtime of the resultant code. Among other factors, the runtime is a function of the number of instructions and their individual latencies. Therefore, reducing the amount of instructions and using instructions with lower latency can improve the performance of an application.
In many compiler architectures, the conversion process involves multiple stages in which various intermediate level code representations are generated, after which differing code optimizations are applied before finally converting the code into the low level equivalent. This multi-stage process is used because many algorithms for code optimization are easier to apply one at a time to an intermediate level code, or because the input to one optimization relies on the processing performed by another optimization. The manner in which this multi-stage process proceeds also depends on the processor architecture.
Modern processors generally operate with instruction sets. An instruction set, or instruction set architecture (ISA), is the programming part of the computer architecture that addresses the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, external input/output (I/O), and the like. An ISA can also include a specification of the set of machine language operation codes, which are the native commands implemented by a particular processor. There are various types of ISAs for modern processors including, for example, reduced instruction set computer (RISC), complex instruction set computer (CISC), explicitly parallel instruction computing (EPIC), and the like. The compiler will generally use a code generator to generate low level machine or assembly code using the associated instruction set. The code generator may take as input a parsed or abstract syntax tree of the high level code or the output of an optimization and converts the tree into a linear sequence of instructions in the low level code.
Instruction sets may provide separate operations depending on the precision of the data being operated on. For example, an ISA may define a full precision add operation and a half precision add operation. In the integer/floating point example from above, the addition of two floating point data types will be handled by the full precision add operation, while addition of two integers will be handed by the half precision add. These particular operations use corresponding registers. In general the computer architecture will define full precision and half precision registers for use with related operations. Therefore, code generation in such architectures is generally driven by the precision of the data types of the data that resides in the registers.
Because high level programming code can provide for code that operates on both a half precision data type and a full precision data type, an ISA also usually includes conversion operations that will up-convert half precision data types to full precision for operations and then down-convert the full precision data types back to half precision after the operation is completed. However, these conversion operations usually bring higher processing cycle costs and latency. It would be advantageous to produce the set of most efficient, least latent operations as possible without sacrificing the programmer's intended precision of the output of the program.
Turning now to FIG. 1, a block diagram is shown which illustrates an example compilation process 10. Input code 101 represents a segment of high level programming code that a compiler 100 will compile. The input code 101 represents a loop that is executed as a part of the high level code that performs some arithmetic operations. Line 1 sets up the loop operation; line 2 performs a first arithmetic operation using two single-precision variables and a constant; line 3 performs a second arithmetic operation using three variables, in which one of the variables, ‘z’, has been defined as a double precision data type; and line 4 defines the end point of the loop. The compiler 100 processes the input code 101 and produces output code 102, which is the output loop segment resulting from compilation of the input code 101. Further, the output code is in a intermediate representation where the variables have been replaced by virtual registers. An appropriate single or double precision register is used based on the type of the variable. The output code 102 may be converted into instructions of the ISA at a later time by the compiler.
The loop defined in four lines of high level code in the input code 101 results in a loop defined in six lines in the output code 102. Lines 1 and 6 of the output code 102 define the loop. Single precision registers are denoted by SR followed by a number. Double precision registers are denoted by DR followed by number. The precision of the instruction is denoted by the suffix number. In line 2, a single precision add, FADD16, is defined in which SR1 represents the single precision register to hold the constant “1.5”, SR2 represents the variable ‘y’, and SR0 represents the result of the add, ‘x’. Because the multiplication instruction of line 3 in the input code 101 involves a double precision data type, line 3 of the output code 102 provides an up-conversion instruction that up-converts the variable ‘x’, in register SR0, into a double precision data type in register DR10. The double precision multiplication is defined in line 4 in which ‘x’—DR10 is multiplied with ‘z’—DR11, with the result being stored back in DR10, now representing the variable ‘y’. The variable ‘y’ is defined as a single precision data type in the remainder of the program. Therefore, line 5 in the output code 102 provides a down-conversion instruction that down-converts the double precision variable ‘y’—DR10 into a single precision data type in register SR2.
Each of these instructions is processed 10 times, as defined in the loop. In some processor architectures, the processor cycle cost for a conversion operation is usually higher than a simple add or other simple arithmetic operation. Thus, the conversion costs that result from the loop defined in the output code 102 are relatively high compared with the actual cycle costs for the defined arithmetic functions. Many modern processors provide only for instruction set operations between same precision type variables. Moreover, several multiple pipeline processor architectures will organize the instructions into separate pipelines in which each pipeline will only handle instructions of a particular precision. Therefore, conversions are essentially necessary for instruction processing.