Continuing advances in semiconductor technology have greatly increased the amount of processing that can be performed by single-chip, general-purpose computing devices. The relatively slow increase in inter-chip communication bandwidth requires that modern high performance devices use as much of the potential on-chip processing power as possible. This results in large, dense integrated circuit devices and a large design space of processing architectures.
One way of viewing this design space is in terms of granularity. Designers have the option of building very large processing units, or many smaller ones, in the same space. Traditional architectures are either very coarse grain, such as microprocessors, or very fine grain, such as field programmable gate arrays (FPGAs). Both architectures have advantages and disadvantages.
Microprocessors incorporate very few large processing units that operate on wide data-words, and each unit is hardwired to perform defined instructions on these data-words. Usually each unit is optimized for a different set of instructions, such as integer and floating point, and the units are generally hardwired to operate in parallel. The hardwired nature of these units allows very rapid instructions. In fact, a great deal of area on modern microprocessor chips is dedicated to cache memories in order to support a very high rate of instruction issue. Thus, the devices efficiently handle very dynamic instruction streams.
Very fine grain devices, such as FPGAs, incorporate a large number of very small processing elements. These elements are arranged in a configurable interconnect network. The configuration data used to define the functionality of the processing units and network can be thought of as a very large, semantically powerful, instruction word. Nearly any operation can be described and mapped to hardware.