Software compilers produce object code from source code. The source code includes a plurality of human readable instructions (e.g., If x>3 then y=log(x)). The object code produced from the source code includes a plurality of machine readable instructions (e.g., FF08AB12 . . . ). The machine readable instructions include opcodes and operands. An opcode is a number understood by a particular processor to mean a certain operation is to be performed (e.g., addition). An operand is data used in the operation (e.g., a number to be added to another number).
The opcodes that are generated by the compiler for the object code depend on what instructions are included in the source code. For example, a read operation has a different opcode than a write operation. In addition, the opcodes that are generated by the compiler depend on what type of processor is used by the target system. For example, the opcode for a read operation may be different from processor to processor.
Similarly, some processors may include native hardware that other processors do not include. For example, some processor architectures include an opcode for floating-point addition. When this opcode is encountered by the processor, the processor causes a floating-point unit (FPU) to execute the floating-point addition operation. However, other processors do not include a floating-point unit. As a result, these processor architectures do not include opcodes for floating-point operations.
Typically, prior-art compilers that encounter a transcendental floating-point function (e.g., log(x)) retrieve a plurality of instructions from a run-time library that include one or more primitive floating-point operations (e.g., floating-point addition, floating-point multiplication, etc.). The prior-art compiler then compiles the retrieved instructions. When the prior-art compiler encounters each primitive floating-point operation, the prior-art compiler typically uses one of two approaches based on the capabilities of the target processor. If the target processor includes a floating-point unit, the prior-art compiler typically generates the appropriate floating-point opcode.
However, if the prior-art compiler encounters a primitive floating-point operation (e.g., floating-point addition) associated with a transcendental floating-point function (e.g., log(x)), and the target processor does not include a floating-point unit, the prior-art compiler typically retrieves a primitive floating-point emulation function (e.g., emulation of floating-point addition) which includes one or more primitive integer-based operations (e.g., integer-based addition) from the run-time library. The prior-art compiler then compiles the newly retrieved instructions. When the prior-art compiler encounters each primitive integer-based operation, the prior-art compiler typically generates the appropriate integer opcode.
In this manner, the prior-art compiler is able to generate object code from source code that includes calls to transcendental floating-point functions for execution on target processors that do not include a floating-point unit. In addition, the only additional run-time library routines that are required for this approach are the primitive floating-point emulation functions (e.g., emulation of floating-point addition using integer-based addition).
However, this approach produces inefficient object code for several reasons. First, the nesting of function calls required by this approach produces a large number of overhead instructions (e.g., return from subroutine opcodes), thereby increasing the object code size. Similarly, the nesting of function calls required by this approach produces a large number of stack operations (e.g., pop return address from stack), thereby increasing object code execution time. For example, if a call to log(x) produces ten calls to floating primitives (e.g., floating point adds), and each call to a floating point primitive produces ten calls to integer-based emulation functions, the total number of subroutine calls is at least one hundred. In addition, the overall algorithm under this approach may be inefficient because the algorithm is not optimized for integer-based operations.