1. Field of the Invention
This invention relates generally to the field of microprocessors and, more particularly, to execution units within microprocessors.
2. Description of the Related Art
Microprocessors are typically designed with a number of xe2x80x9cexecution unitsxe2x80x9d that are each optimized to perform a particular set of functions or instructions. For example, one or more execution units within a microprocessor may be optimized to perform memory accesses, i.e., load and store operations. Other execution units may be optimized to perform general arithmetic and logic functions, e.g., shifts and compares. Many microprocessors also have specialized execution units configured to perform more complex floating-point arithmetic operations including multiplication and reciprocal operations. These specialized execution units typically comprise hardware that is optimized to perform one or more floating-point arithmetic functions.
Most microprocessors must support multiple data types. For example, x86 compatible microprocessors must execute instructions that are defined to operate upon an integer data type and instructions that are defined to operate upon floating-point data types. Floating-point data can represent numbers within a much larger range than integer data. For example, a 32-bit signed integer can represent the integers between xe2x88x92231 and 231xe2x88x921 (using two""s complement format). In contrast, a 32-bit (xe2x80x9csingle precisionxe2x80x9d) floating-point number as defined by the Institute of Electrical and Electronic Engineers (IEEE) Standard 754 has a range (in normalized format) from 2xe2x88x92126 to 2127xc3x97(2xe2x88x922xe2x88x9223) in both positive and negative numbers.
Turning now to FIG. 1A, an exemplary format for an 8-bit integer 100 is shown. As illustrated in the figure, negative integers are represented using the two""s complement format 104. To negate an integer, all bits are inverted to obtain the one""s complement format 102. A constant of one is then added to the least significant bit (LSB).
Turning now to FIG. 1B, an exemplary format for a 32-bit (single precision) floating-point number is shown. A floating-point number is represented by a significant, an exponent and a sign bit. The base for the floating-point number is raised to the power of the exponent and multiplied by the significand to arrive at the number represented. In microprocessors, base 2 is typically used. The significand comprises a number of bits used to represent the most significant digits of the number. Typically, the significand comprises one bit to the left of the radix point and the remaining bits to the right of the radix point. In order to save space, the bit to the left of the radix point, known as the integer bit, is not explicitly stored. Instead, it is implied in the format of the number. Additional information regarding floating-point numbers and operations performed thereon may be obtained in IEEE Standard 754 (IEEE-754). Unlike the integer representation, two""s complement format is not typically used in the floating-point representation. Instead, sign and magnitude form are used. Thus, only the sign bit is changed when converting from a positive value 106 to a negative value 108.
Numerical data formats, such as the IEEE-754, often include a number of special and exceptional cases. These special and exceptional cases may appear in one or more operands or one or more results for a particular instruction. FIG. 2 illustrates the sign, exponent, and significand formats of special and exceptional cases that are included in the IEEE-754 floating-point standard. The special and exceptional cases shown in FIG. 2 include a zero value, an infinity value, NaN (not-a-number) values, and a denormal value. An xe2x80x98xxe2x80x99 in FIG. 2 represents a value that can be either one or zero. NaN values may include a QNaN (quiet not-a-number) value and a SNaN (signaling not-a-number) value as defined by a particular architecture. The numbers depicted in FIG. 2 are shown in base 2 format as indicated by the subscript 2 following each number. As shown, a number with all zeros in its exponent and significand represents a zero value in the IEEE-754 floating-point standard. A number with all ones in its exponent, a one in the most significant bit of its significand, and zeros in the remaining bits of its significant represents an infinity value. The remaining special and exceptional cases are depicted similarly.
Floating-point execution units will occasionally generate results that are smaller in magnitude than the smallest normalized number representable in a given floating-point precision, i.e. the exponent of the result is less than the minimum exponent for normalized numbers in that precision. These results are often referred to as xe2x80x9ctinyxe2x80x9d results. A tiny result may eventually yield a final result of either zero, a denormal, or the smallest normalized number in that precision. Despite the fact that tiny results occur rarely in many floating-point execution units, a floating-point execution unit must spend additional processing time and/or include additional hardware to correctly handle the tiny result and produce the desired final result. Thus, a system and method to handle tiny numbers without increasing microprocessor hardware are desired.
The problems outlined above are in large part solved by an apparatus and method in described herein. Generally speaking, an apparatus and method for handling tiny numbers using a super sticky bit are provided. In response to detecting that a preliminary result of an instruction corresponds to a tiny number and an underflow exception is masked, an execution pipeline can be configured to store a value corresponding to the preliminary result and a super sticky bit in a destination register. Also, a destination register tag corresponding to the destination register and a denormal exception indicator corresponding to the tiny number and masked underflow exception can be stored. A trap handler can be initiated to generate a corrected result for the instruction. The trap handler can detect that the denormal exception indicator has been set and can read the value and the super sticky bit from the destination register using the destination register tag. The trap handler can generate a corrected result for the instruction based on the value and the super sticky bit. An instruction subsequent to the trapping instruction can then be restarted.
The use of the apparatus and method for handling tiny numbers using a super sticky bit may provide performance advantages over other systems. The apparatus and method may reduce the hardware needed to handle results that correspond to tiny numbers. The apparatus and method may also allow instructions to execute more efficiently by executing the more common non-tiny result cases faster while ensuring that a correct result is generated for the rare tiny result cases.
Broadly speaking, an execution unit is contemplated. In one embodiment, the execution unit includes an execution pipeline, a retire queue coupled to said execution pipeline, and a trap handler. The execution pipeline is configured to generate a super sticky bit corresponding to an instruction in response to a preliminary result of said instruction corresponding to a tiny number and in response to an underflow exception mask being set. The execution pipeline is configured to store a value corresponding to the preliminary result and the super sticky bit in a destination register. The retire queue is configured to store a denormal exception indicator corresponding to the instruction and a destination register tag corresponding to said destination register. The trap handler is configured to generate a corrected result using the value and the super sticky bit in response to the denormal exception indicator being set. The trap handler is configured to store the corrected result in the destination register using the destination register tag.
A method is also contemplated. The method includes determining that a preliminary result of an instruction corresponds to a tiny number, determining that an underflow exception is masked, and generating a super sticky bit. The method also includes writing a value corresponding to the preliminary result to a destination register, writing the super sticky bit to the destination register, and setting a denormal exception indicator corresponding to the instruction. The method further includes initiating a trap handler, generating a corrected result using the value and the super sticky bit, and writing the corrected result to the destination register
In addition, a microprocessor is contemplated. In one embodiment, the microprocessor includes an execution unit and a reorder buffer coupled to the execution unit. The execution unit includes an execution pipeline, a retire queue coupled to said execution pipeline, and a trap handler. The execution pipeline is configured to generate a super sticky bit corresponding an instruction in response to a preliminary result of said instruction corresponding to a tiny number and in response to an underflow exception mask being set. The execution pipeline is configured to store a value corresponding to the preliminary result and the super sticky bit in a destination register. The retire queue is configured to store a denormal exception indicator corresponding to the instruction and a destination register tag corresponding to said destination register. The trap handler is configured to generate a corrected result using the value and the super sticky bit in response to the denormal exception indicator being set. The trap handler is configured to store the corrected result in the destination register using the destination register tag. The reorder buffer is configured to convey an abort signal corresponding to the instruction to the retire queue. In some embodiments, the microprocessor may be configured to retire the instruction that produces the tiny result, but abort subsequent instructions in order to start a trap handler.
In addition, a computer system comprising a microprocessor and an input/output device is contemplated. The microprocessor includes an execution unit and a reorder buffer coupled to the execution unit. The execution unit includes an execution pipeline, a retire queue coupled to said execution pipeline, and a trap handler. The execution pipeline is configured to generate a super sticky bit corresponding an instruction in response to a preliminary result of said instruction corresponding to a tiny number and in response to an underflow exception mask being set. The execution pipeline is configured to store a value corresponding to the preliminary result and the super sticky bit in a destination register. The retire queue is configured to store a denormal exception indicator corresponding to the instruction and a destination register tag corresponding to said destination register. The trap handler is configured to generate a corrected result using the value and the super sticky bit in response to the denormal exception indicator being set. The trap handler is configured to store the corrected result in the destination register using the destination register tag. The reorder buffer is configured to convey an abort signal corresponding to the instruction to the retire queue. In some embodiments, the microprocessor may be configured to retire the instruction that produces the tiny result, but abort subsequent instructions in order to start a trap handler. The input/output device is configured to communicate between the microprocessor and another computer system.