In the field of data processing systems and circuits, parallel processing refers to the simultaneous use of multiple data processing circuits, such as microprocessor central processing units (CPUs), to perform a single computer program. In this arrangement, the multiple CPUs execute program instructions in parallel, under certain constraints relative to the order of operations and use of shared resources. The theoretical performance of a parallel processing system thus can be much higher than that of systems having a single CPU. Supercomputers having many parallel processors (e.g., up to on the order of 256 or more processors) are known in the art.
Many of the known parallel processing systems are arranged using multiple microprocessor integrated circuits, with each integrated circuit having a single CPU. It is contemplated, however, that future parallel processing systems will be implemented, at least in part, by single-chip microprocessor integrated circuits that have multiple CPUs implemented therewithin.
The benefits of parallel processing have been achieved, in part, in modern single chip microprocessors of the so-called "superscalar" architecture. For example, microprocessors of the well-known x86-architecture, particularly those compatible with and having the capability of the PENTIUM microprocessor available from Intel Corporation, are considered to be superscalar. Superscalar microprocessors include at least two instruction pipelines that can operate in parallel, under certain constraints upon the types of parallel instructions (considering the available processor resources). A two-pipeline superscalar microprocessor is thus able to execute up to two multiple-cycle instructions in each machine cycle.
Typical modern microprocessors also include on-chip floating-point units (FPUs) that perform arithmetic operations on floating-point data operands. As is fundamental in the art, floating-point data operands are digital data words that have both a mantissa and an exponent portion, for representing non-integer numbers. Typically, as in the case of 486-class microprocessors, the FPU is implemented in a microprocessor as a separate execution unit, having its own instruction pipeline, such that floating-point instructions are forwarded to the FPU for decoding and execution, in similar fashion as if it were located on a separate integrated circuit. In the case of the PENTIUM microprocessor, which is superscalar in the sense that it has two integer pipelines, the floating-point pipeline is shared with one of the integer pipelines through the integer execution and integer writeback stages, with the floating-point execution stages being added to the length of that integer pipeline.
As is well known in the art, the FPU of a modern microprocessor is a relatively complex circuit, and requires a significant portion of the integrated circuit chip area for its realization. According to conventional techniques, therefore, the implementation of multiple CPUs, each having their own dedicated FPU, onto a single integrated circuit chip, will require an extremely large chip size. This large chip size translates, of course, into both low manufacturing yields and also high per-chip manufacturing cost.