The present invention is directed, in general, to microprocessors and, more particularly, to a processor architecture employing an improved floating point unit (FPU).
The ever-growing requirement for high performance computers demands that computer hardware architectures maximize software performance. Conventional computer architectures are made up of three primary components: (1) a processor, (2) a system memory and (3) one or more input/output devices. The processor controls the system memory and the input/output (xe2x80x9cI/Oxe2x80x9d) devices. The system memory stores not only data, but also instructions that the processor is capable of retrieving and executing to cause the computer to perform one or more desired processes or functions.
The I/O devices are operative to interact with a user through a graphical user interface (xe2x80x9cGUIxe2x80x9d) (such as provided by Microsoft Windows(trademark) or IBM OS/2(trademark)), a network portal device, a printer, a mouse or other conventional device for facilitating interaction between the user and the computer.
Over the years, the quest for ever-increasing processing speeds has followed different directions. One approach to improve computer performance is to increase the rate of the clock that drives the processor. As the clock rate increases, however, the processor""s power consumption and temperature also increase. Increased power consumption is expensive and high circuit temperatures may damage the processor. Further, the processor clock rate may not increase beyond a threshold physical speed at which signals may traverse the processor. Simply stated, there is a practical maximum to the clock rate that is acceptable to conventional processors.
An alternate approach to improve computer performance is to increase the number of instructions executed per clock cycle by the processor (xe2x80x9cprocessor throughputxe2x80x9d). One technique for increasing processor throughput is pipelining, which calls for the processor to be divided into separate processing stages (collectively termed a xe2x80x9cpipelinexe2x80x9d). Instructions are processed in an xe2x80x9cassembly linexe2x80x9d fashion in the processing stages. Each processing stage is optimized to perform a particular processing function, thereby causing the processor as a whole to become faster.
xe2x80x9cSuperpipeliningxe2x80x9d extends the pipelining concept further by allowing the simultaneous processing of multiple instructions in the pipeline. Consider, as an example, a processor in which each instruction executes in six stages, each stage requiring a single clock cycle to perform its function. Six separate instructions can therefore be processed concurrently in the pipeline; i.e., the processing of one instruction is completed during each clock cycle. The instruction throughput of an n-stage pipelined architecture is therefore, in theory, n times greater than the throughput of a non-pipelined architecture capable of completing only one instruction every n clock cycles.
Another technique for increasing overall processor speed is xe2x80x9csuperscalarxe2x80x9d processing. Superscalar processing calls for multiple instructions to be processed per clock cycle. Assuming that instructions are independent of one another (the execution of each instruction does not depend upon the execution of any other instruction), processor throughput is increased in proportion to the number of instructions processed per clock cycle (xe2x80x9cdegree of scalabilityxe2x80x9d). If, for example, a particular processor architecture is superscalar to degree three (i.e., three instructions are processed during each clock cycle), the instruction throughput of the processor is theoretically tripled.
These techniques are not mutually exclusive; processors may be both superpipelined and superscalar. However, operation of such processors in practice is often far from ideal, as instructions tend to depend upon one another and are also often not executed efficiently within the pipeline stages. In actual operation, instructions often require varying amounts of processor resources, creating interruptions (xe2x80x9cbubblesxe2x80x9d or xe2x80x9cstallsxe2x80x9d) in the flow of instructions through the pipeline. Consequently, while superpipelining and superscalar techniques do increase throughput, the actual throughput of the processor ultimately depends upon the particular instructions processed during a given period of time and the particular implementation of the processor""s architecture.
The speed at which a processor can perform a desired task is also a function of the number of instructions required to code the task. A processor may require one or many clock cycles to execute a particular instruction. Thus, in order to enhance the speed at which a processor can perform a desired task, both the number of instructions used to code the task as well as the number of clock cycles required to execute each instruction should be minimized.
Statistically, certain instructions are executed more frequently than others. If the design of a processor is optimized to rapidly process the instructions that occur most frequently, then the overall throughput of the processor can be increased. Unfortunately, the optimization of a processor for certain frequent instructions is usually obtained only at the expense of other less frequent instructions, or requires additional circuitry, which increases the size of the processor.
As computer programs have become increasingly more graphic-oriented, processors have had to deal more and more with the conversion between integer and floating point representations of numbers. Thus, to enhance the throughput of a processor that must generate data necessary to represent graphical images, it is desirable to optimize the processor to efficiently convert between integer and floating point representations of data.
U.S. Pat. No. 5,257,215 to Poon, issued Oct. 26, 1993, describes a circuit and method for the performing integer to floating point conversions in a floating point unit. The method disclosed, however, requires a two""s complement operation for the conversion of negative numbers; a two""s complement operation requires additional clock cycles and is thus undesirable if the throughput of the floating point unit is to be optimized.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide an efficient system and method for converting numbers from integer notation to floating point notation and a computer system employing the same. Preferably, the optimization of the processor should not require any additional hardware or degrade the performance of the processor in performing tasks other than integer to floating point conversions; in particular, the conversion of negative numbers should not require the performance of a two""s complement operation.
In the attainment of the above primary object, the present invention provides, for use in a processor having a floating point execution core, logic circuitry for, and a method of, converting negative numbers from integer notation to floating point notation. In one embodiment, the logic circuitry includes: (1) a one""s complementer that receives a number in integer notation and inverts the received number to yield an inverted number, (2) a leading bit counter, coupled to the one""s complementer, that counts leading bits in the inverted number to yield leading bit data, (3) a shifter, coupled to the one""s complementer and the leading bit counter, that normalizes the inverted number based on the leading bit data to yield a shifted inverted number, (4) an adder, coupled to the shifter, that increments the shifted inverted number to yield a fractional portion of the received number in floating point notation and overflow data, the adder renormalizing the fractional portion based on the overflow data and (5) exponent generating circuitry, coupled to the leading bit counter and the adder, that generates an exponent portion of the received number in floating point notation as a function of the leading bit data and the overflow data.
The present invention therefore fundamentally reorders the process by which numbers are converted from integer to floating point notation to allow such numbers to be converted in a pipelined process. The present invention is founded on the novel realization that one""s complementing (a part of the two""s complementing process required in converting negative numbers) can be allowed to occur before normalization (shifting). The present invention is therefore particularly suited to floating point units (xe2x80x9cFPUsxe2x80x9d) having a pipelined load converter and adder, as the hardware already present in the converter and adder can be employed to perform integer to floating point conversion.
In one embodiment of the present invention, the logic circuitry further includes a multiplexer, interposed between the one""s complementer and the shifter, that selects one of the received number and the inverted number based on a sign of the received number. Thus, the present invention can be adapted for use in additionally converting positive numbers. Positive numbers have no need to be two""s complemented during conversion. Therefore, in this embodiment, steps are taken to bypass the one""s complementing to which negative numbers are subjected.
In one embodiment of the present invention, the exponent generating circuitry comprises a bias converter that generates an uncompensated biased exponent, the exponent generating circuitry adjusting the uncompensated biased exponent as a function of the leading bit data and the overflow data to yield the exponent portion. Those skilled in the art are familiar with the manner in which exponents are biased or unbiased during notation conversion. In this embodiment, the present invention enhances the bias process by further adjusting for any xe2x80x9coverguessingxe2x80x9d that may occur in the adder.
In one embodiment of the present invention, the leading bit counter counts a number of leading zeroes in the inverted number. Alternatively, leading ones in the received (uninverted number) may be counted. Those skilled in the art are familiar with conventional normalization processes in which integers are shifted and thereby normalized.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.