1. Field of the Invention
The present invention relates to processing units for computers. In particular, the present invention relates to a processing unit that performs numerical computation, such as a floating point operation.
2. Description of the Related Art
Conventionally, computers having higher processing speed have been demanded. In particular, in the field of scientific and engineering simulations that involve a large computational load, processing units that perform numerical computation at high speed have been demanded.
Most conventional computers have an architecture called von Neumann architecture (or, stored-program architecture). In von Neumann architecture, the bandwidth (the transfer rate) between the central processing unit (CPU) and the memory limits the computational processing performance. This limitation is called the von Neumann bottleneck. In current semiconductor-process technology, it is difficult to integrate CPUs and memories because combination of manufacturing process of CPU with that of the memory is not realized. Thus, CPUs and memories are typically implemented in separate semiconductor integrated circuits and the influence of the bandwidth between the CPUs and the memories on the computational speed has become significant. Accordingly, attempts are being made to overcome the von Neumann bottleneck.
In order to overcome the von Neumann bottleneck, the present inventors developed a computer system, called GRAPE (GRAvity PipE), in which a special-purpose computing unit, or dedicated computing unit is connected to a general-purpose host computer to perform high-computational-load processing. GRAPE has a dedicated computing unit that performs computation dedicated to particle simulation. The dedicated computing unit includes a semiconductor chip having a large number of pipelines that employ hardware to achieve computational operations for efficient calculation of interaction between particles. The dedicated computing unit has a memory unit shared by the large number of pipelines. As a result of this architecture, in spite of its relatively small circuit scale, the GRAPE exhibits greater computational processing performance than some supercomputers (e.g., refer to J. Makino, E. Kokubo, and M. Taiji, “HARP: A Special-purpose Computer for N-body Simulations”, publication of the Astronomical Society of Japan, 45, pp. 349-360, (1993)).
Another example of available computers for overcoming the von Neumann bottleneck is a reconfigurable computer (RC) which employs FPGAs (field programmable gate arrays). The computer-system architecture in which an RC is used to perform high-speed numerical computations is similar to the GRAPE architecture and employs a host computer and an FPGA board. An external memory and an FPGA network that comprises FPGAs are mounted on the FPGA board.
In addition, an SIMD (single instruction, multiple data) massively parallel computer may also be used for efficiently performing numerical computations. In SIMD massively parallel computers, multiple processor chips are used, and processor units, each having a local memory and a register file, are integrated into each processor chip (refer to Japanese Patent Provisional Publication No. 5-174166).
The GRAPE computer may perform an intended computation at high speed, but processible computations are specified at the stage when the pipeline implementation into hardware is determined. Thus, the GRAPE computer lacks versatility.
The RCs also have some problems, however. Specifically, since FPGAs used are designed to be reconfigurable, the circuit scale of RC is limited. Also, the operating speed cannot be increased as compared to other processors. Further, when the RC is used to perform double-precision floating-point operations used in typical numerical computation, the computing speed decreases. Consequently, the RC may be used for high-speed computation only when low computational accuracy (e.g., numerical computations for fixed point operations) is allowable. In addition, in order for the RC to perform an intended computation, for example, the user needs to configure the FPGAs by programming near-hardware level language, such as VHDL (Vhsic Hardware Description Language), thereby making it difficult for the user to develop an application.
With the SIMD massively parallel computer, when an attempt is made to integrate a large number of processor units into a single chip, the memory bandwidth becomes relatively insufficient, and thus, a limitation similar to the von Neumann bottleneck occurs. Therefore, there is continuing limitation in that, even as the semiconductor manufacturing technology advances, the integration of the processor units cannot be increased in proportion to such progress.