The invention relates to the architecture and design of high-performance multithreaded processor and multi-processor integrated circuits.
Most modern processors embody several pipelined functional units. Typical such units include integer units capable of performing integer arithmetic between register operands, and floating point units capable of performing floating point arithmetic between register operands. There may be dedicated functional units for performing address arithmetic, or, in some machines, integer units may perform these operations. Other functional units may include fetch and store units that operate to retrieve operands from, or store results into, memory. These functional units are referred to herein as resources.
Many modern processors are capable of commanding operations in more than one functional unit simultaneously. Processors having this ability include many VLIW (Very Long Instruction Word) processors and the Itanium (Trademark of Intel Corporation) processors. The process of commanding operations in functional units is instruction decode and dispatch.
The Itanium processors use an explicitly parallel instruction set wherein instructions are packaged in groups of three, where instructions are not permitted to depend on results of instructions of the same group, and where it is often possible to dispatch multiple instructions of the same group simultaneously. The Itanium processors, and other superscalar machines, have sufficient resources, and sufficiently complex control, that it is possible to simultaneously dispatch operations from more than one instruction simultaneously
Much modern software is written to take advantage of multiple processor machines. This software typically is written to use multiple threads. Software is also frequently able to prioritize those threads, determining which thread should receive the most resources at a particular time.
Multithreaded processors are those that have more than one instruction pointer, typically have more than one register set, and are capable of executing more than one instruction stream. For example, machines are known wherein a single pipelined execution unit is timeshared among several instruction streams. These machines appear to software as multiple, independent, processors.
Machines of superscalar performance having multiple processors on single integrated circuits are known. Machines of this type include some implementations of the Itanium, IBM Power-4 and PA 8800. Typically, each processor on these integrated circuits has its own set of execution unit pipelines. Their performance and die area, and therefore cost for execution units, is therefore typically much greater than with a timeshared multithreaded machine.
Many modern machines integrate some system devices onto their processor integrated circuits. These system devices may include memory interface controllers, cache memory subsystems, Direct Memory Access (DMA) controllers, disk interfaces, display adapters, and other Input/Output (I/O) controllers.
The system devices desired on a processor integrated circuit vary with the system in which the integrated circuit is installed. For example, an on-chip display adapter may be of great use in low cost systems, while an external high-performance display adapter may be provided in a higher performance system. Similarly, a low cost system may require a single port of IDE disk interface, while a higher-end system may require dual SCSI disk-interface ports.
The lengthy design cycle and high expense of developing high performance processor integrated circuits renders it impractical to design and market a large variety of processor integrated circuit designs each having system devices tailored to a particular set of applications.
Typically, system devices are constructed of custom hardware that is typically not interchangeable with processor hardware on the integrated circuit. Further, each system device is typically a custom design that is useful for only a particular function. Unused system devices present on an integrated circuit consume device area, thereby increasing device cost. Unused devices may also consume power.
Nature of the Problem
It is generally desirable to simplify systems, and reducing system cost, by increasing integration of system functions on a single VLSI device. It is therefore desirable to minimize the integrated circuit area allocated to particular system devices, while providing the flexibility of having a wide variety of system device types on a processor integrated circuit.
A multiple processor integrated circuit embodies a pool of resources that may be utilized as either components of system devices or components of processor cores. The circuit also has a group of specialty functional blocks of particular utility in constructing particular system devices. The circuit is provided with an allocation control mechanism whereby these resources may be dynamically assigned to groups.
The integrated circuit also has an allocation control mechanism. The allocation control mechanism is capable of configuring each of these resource groups to function as a system device or as a processor core.
In various embodiments, the system devices that may be constructed from resource groups (hereinafter constructable devices) include at least one disk interface adapter capable of interfacing with external disk drives of the IDE, SCSI, or Fibre Channel types. The constructable devices can also be configured as a network adapter capable of interfacing with interconnect of the 100 baseT or Gigabit type, or as a display adapter.