Field of the Invention
The present invention relates to the field of digital computer processors and, more particularly, to fast, very large scale integrated circuit (VLSI) floating point processors for use in computer systems for computation-intensive tasks.
In the field of technical computing, such as digital signal processing, graphics processing, CAD and simulation applications, it is necessary to perform large numbers of arithmetic operations very quickly. Additionally, it is desirable for many applications to perform floating point operations in order to accommodate numbers which vary widely in magnitude.
Most general purpose programmable digital computers may be programmed to perform various arithmetic operations including floating point operations. That software approach is too slow, however, for modern applications such as those mentioned above. One way to hasten arithmetic operations is to use special purpose circuits. These circuits, commonly called coprocessors, are dedicated to performing mathematical operations in conjunction with a general purpose CPU such as that found in a modern microprocessor chip. Dedicated circuits of that type perform arithmetic operations significantly faster than doing so with software. The size and speed limitations of coprocessors, nonetheless, are inadequate for use in technical computing systems of the classes currently known as super-minicomputers or mini-supercomputers. Present day machines of that type operate at speeds (bandwidths) on the order of 20-50 MHz. Yet, available floating point hardware meeting size, cost and power constraints of such systems operate at half that rate, thereby wasting other system resources.
System designers face the restrictions of traditional hardware operations: floating point coprocessors are easy to design with, but offer limited performance. CMOS integrated chip sets can boost speed but demand substantial design efforts. Custom or semi-custom components can provide excellent speed but also take considerable design resources and are very expensive to design and implement.
Many designers rely on a CMOS floating point chip set in a pipelined system to maximize performance. But latency can extend the computing time of a CMOS integrated circuit, for example, 100 nanoseconds, to three or four times that period for double-precision multiplication. In fact, for most technical applications it is latency, not the pipeline rate, that determines computation speed. Moreover, pipelining complicates both hardware and software design.
Bipolar technology was used to implement first-generation technical computing and digital signal processing systems because of its speed and availability. Since then, significant enhancements in CMOS technology have resulted in CMOS being favored over bipolar technology for many applications. Using VLSI technology, pipelined CMOS systems have achieved system-level throughput rates of 20 MFlops while maintaining power dissipation levels below 2 watts per package.
The most advanced integrated circuits known to be available for operations including floating point multiply are the Analog Devices ADSP3210 and the Texas Instruments 8847. The Analog Devices chip employs CMOS technology and requires four passes through the multiplier hardware to perform a double precision floating point multiply in about 400 nanoseconds. The Texas Instruments chip employs a two-stage pipelined architecture having approximately a 50 nanosecond delay per stage in performing double precision multiply. A relatively fast bipolar integrated circuit presently available is the AMD 29325. That device performs a 32-bit single precision operation in about 100 nanoseconds. It is not capable of double precision operations.
Many designers perceive bipolar emitter-coupled logic (ECL) technology as having a less attractive price-performance combination compared to CMOS. To be suited for high-speed VLSI, the fabrication technology selected must be based on small, fast transistors. Traditional bipolar transistors were fast, but their relatively large size resulted in large device and interconnect capacitances. This, of course, limited their speed. Integrated circuits based on ECL technology also dissipated a great deal of power and were not densely packaged. Where speed was critical, designers would use CMOS technology to extract parallelism from algorithms and implement systems rather than designing with SSI and MSI components required by bipolar technology.
We have reassessed the architectural configuration of floating point processors, multipliers and ALUs commonly used in conventional practice, particularly in light of the development of more advanced integrated circuit processes. Several such processes have recently been described in the literature, including: Downing, P., et al., "Denser Process Gets the Most Out of Bipolar VLSI," Electronics, pp. 131-133, June 28, 1984; "A Bipolar Process That's Repelling CMOS," Electronics, p. 45-47, Dec. 23, 1985; "Surprise! ECL Runs on Only Microwatts," Electronics, pp. 35-38, Apr. 7, 1986; and Wilson, G., "Creating Low-Power Bipolar ECL at VLSI Densities," VLSI Systems Design, pp. 84-86, May 1986. Other VLSI bipolar processes include the National Semiconductor/Fairchild ASPECT process and the AMCC/Plessy HE1 process.
These more advanced processes provide increased speed and device density and lower power dissipation levels, which in turn offer several significant benefits to the system designer and user. First, smaller transistors enable higher density and thereby allow implementation of more complex functions on a chip. Second, with greater density, the system designer can use fewer parts, and power requirements are reduced. As a result, the speed and throughput of the overall system can be increased because the parts interconnection delay can be readily reduced.