The present invention relates in general to microprocessors, and in particular to a multipurpose multiply-add functional unit for a processor core.
Real-time computer animation places extreme demands on processors. To meet these demands, dedicated graphics processing units typically implement a highly parallel architecture in which a number (e.g., 16) of cores operate in parallel, with each core including multiple (e.g., 8) parallel pipelines containing functional units for performing the operations supported by the processing unit. These operations generally include various integer and floating point arithmetic operations (add, multiply, etc.), bitwise logic operations, comparison operations, format conversion operations, and so on. The pipelines are generally of identical design so that any supported instruction can be processed by any pipeline; accordingly, each pipeline requires a complete set of functional units.
Conventionally, each functional unit has been specialized to handle only one or two operations. For example, the functional units might include an integer addition/subtraction unit, a floating point multiplication unit, one or more binary logic units, and one or more format conversion units for converting between integer and floating-point formats.
Over time, the number of elementary operations (instructions) that graphics processing units are expected to support has been increasing. New instructions such as a ternary “multiply-add” (MAD) instruction that computes A*B+C for operands A, B, and C have been proposed. Continuing to add functional units to support such operations leads to a number of problems. For example, because any new functional unit has to be added to each pipeline, the chip area required to add just additional unit can become significant. New functional units also increase power consumption, which may require improved cooling systems. Such factors contribute to the difficulty and cost of designing chips. In addition, to the extent that the number of functional units exceeds the number of instructions that can be issued in a cycle, processing capacity of the functional units is inefficiently used.
It would, therefore, be desirable to provide functional units that require reduced chip area and that can be used more efficiently.