1. Field of the Invention
Exemplary embodiments of the invention relate to the implementation of arithmetic computation units for integrated circuits. Specifically, exemplary embodiments discuss the cooperative use of two different types of computation units to implement a more complex computation.
2. Description of Background
To improve computation performance in the face of decreasing benefit from generational silicon technology improvements, designs have moved to implement more complex computation primitives. In general-purpose microprocessors, such computation primitives often take the form of expanded instruction sets implemented on accelerators coupled tightly to a processor core charged with implementing the standard (legacy) set of instructions. Frequently, to improve computation throughput, such accelerators implement a short-vector SIMD (single-instruction multiple-data) computation model, whereby each instruction specifies an operation to be performed across a wide data word, which, depending on the particular instruction, is interpreted as a vector of a small number (1-16) of sub-words. A single instruction may then specify multiple operations on multiple pieces of data.
The disparate types of operations required in computation accelerators (e.g., arithmetic, logical, and data movement operations) and the wide variety of data types involved (e.g., signed and unsigned integer numbers of different size, floating point numbers of different precisions) have typically required the implementation of many different kinds of functional computation units. For example, one functional unit may handle simple integer operations (such as addition, subtraction, comparison, Boolean logical operations, etc.), another functional unit might be responsible for complex integer operations (multiplications, multiply-adds, additive reductions, etc.), a third functional unit may be for floating-point operations, and still another functional unit may be for data formatting and permutation. This exemplary implementation is very expensive in terms of design effort, circuitry required to implement the different functions separately, and power, especially since external constraints (such as the number of computations that can begin or end at a given time), typically render most of this circuitry idle.
Another approach that leverages the external constraints to reduce the need for special purpose functional units would be beneficial.