1. Technical Field of the Invention
The present invention relates to the field of integrated circuit, and more particularly to processors.
2. Prior Art
Conventional processors use logic-based computation (LBC) which carries out computation primarily with logic circuits (e.g. XOR circuit). Logic circuits are suitable for arithmetic operations (i.e. addition, subtraction and multiplication), but not for non-arithmetic functions, i.e. mathematical functions whose operations are more than the arithmetic operations performable by the conventional logic circuits (e.g. elementary functions, special functions). Non-arithmetic functions are computationally hard. Rapid and efficient realization thereof has been a major challenge.
For the conventional processors, only few basic non-arithmetic functions (e.g. basic algebraic functions and basic transcendental functions) are implemented by hardware and they are referred to as built-in functions. These built-in functions are realized by a combination of logic circuits and look-up tables (LUT). For example, U.S. Pat. No. 5,954,787 issued to Eun on Sep. 21, 1999 taught a method for generating sine/cosine functions using LUTs; U.S. Pat. No. 9,207,910 issued to Azadet et al. on Dec. 8, 2015 taught a method for calculating a power function using LUTs.
Realization of built-in functions is further illustrated in FIG. 1A. A conventional processor 300 generally comprises a logic circuit 380 and a memory circuit 370. The logic circuit 380 comprises an arithmetic logic unit (ALU) for performing arithmetic operations, while the memory circuit 370 stores an LUT for the built-in function. To obtain a desired precision, the built-in function is approximated to a polynomial of a sufficiently high order. The LUT 370 stores the coefficients of the polynomial; and the ALU 380 calculates the polynomial. Because the ALU 380 and the LUT 370 are formed side-by-side on a semiconductor substrate 0, this type of horizontal integration is referred to as two-dimensional (2-D) integration.
Computation has been developed along the directions of computational density and computational complexity. The computational density is a figure of merit for parallel computation and it refers to the computational power (e.g. the number of floating-point operations per second) per die area. The computational complexity is a figure of merit for scientific computation and it refers to the total number of built-in functions supported by a processor. The 2-D integration severely limits computational density and computational complexity.
For the 2-D integration, inclusion of the LUT 370 increases the die size of the conventional processor 300 and lowers its computational density. This has an adverse effect on parallel computation. Moreover, because the ALU 380 is the primary component of the conventional processor 300 and occupies a large die area, the LUT 370 is left with a small die area and only supports few built-in functions. FIG. 1B lists all built-in transcendental functions supported by an Intel Itanium (IA-64) processor (referring to Harrison et al. “The Computation of Transcendental Functions on the IA-64 Architecture”, Intel Technical journal, Q4 1999, hereinafter Harrison). The IA-64 processor supports a total of 7 built-in transcendental functions, each using a relatively small LUT (from 0 to 24 kb) in conjunction with a relatively high-degree Taylor-series calculation (from 5 to 22).
This small set of built-in functions (˜10 types, including arithmetic operations) is the foundation of scientific computation. Scientific computation uses advanced computing capabilities to advance human understandings and solve engineering problems. It has wide applications in computational mathematics, computational physics, computational chemistry, computational biology, computational engineering, computational economics, computational finance and other computational fields. The prevailing framework of scientific computation comprises three layers: a foundation layer, a function layer and a modeling layer. The foundation layer includes built-in functions that can be implemented by hardware. The function layer includes mathematical functions that cannot be implemented by hardware (e.g. non-basic non-arithmetic functions). The modeling layer includes mathematical models, which are the mathematical descriptions of the input-output characteristics of a system component.
The mathematical functions in the function layer and the mathematical models in the modeling layer are implemented by software. The function layer involves one software-decomposition step: mathematical functions are decomposed into combinations of built-in functions by software, before these built-in functions and the associated arithmetic operations are calculated by hardware. The modeling layer involves two software-decomposition steps: the mathematical models are first decomposed into combinations of mathematical functions; then the mathematical functions are further decomposed into combinations of built-in functions. Apparently, the software-implemented functions (e.g. mathematical functions, mathematical models) run much slower and less efficient than the hardware-implemented functions (i.e. built-in functions), and extra software-decomposition steps (e.g. for mathematical models) would make these performance gaps even more pronounced.
Because the arithmetic operations performable by the ALC consist of addition, subtraction and multiplication, the mathematical models that can be implemented by the ALC alone are linear models only. Typical mathematical models are nonlinear and cannot be represented by a combination of these arithmetic operations. To illustrate how computationally intensive a mathematical model could be, FIGS. 2A-2B disclose a simple example—the simulation of an amplifier circuit 20. The amplifier circuit 20 comprises a transistor 24 and a resistor 22 (FIG. 2A). All transistor models (e.g. MOS3, BSIM3 V3.2, BSIM4 V3.0, PSP of FIG. 2B) model the transistor behaviors based on the small set of built-in functions provided by the conventional processor 300. Due to the limited choice of the built-in functions, calculating even a single current-voltage (I-V) point for the transistor 24 requires a large amount of computation (FIG. 2B). As an example, the BSIM4 V3.0 transistor model needs 222 additions, 286 multiplications, 85 divisions, 16 square-root operations, 24 exponential operations, and 19 logarithmic operations. This large amount of computation makes simulation extremely slow and inefficient.