Advances in graphics processing techniques have lead to the use of more parallel computational resources for performing graphics related operations. This includes the adaptation of single instruction multiple data (SIMD) architectures to allow a single instruction to be carried out using multiple sets of data to produce multiple results. While providing parallel computational resources achieves greater speed by allowing numerous operations to be performed simultaneously, significant costs can be associated with building devices with high numbers of duplicate execution units. This is especially true in the case of sophisticated processing devices employing a large number of execution units that operate in parallel, wherein each execution unit is a highly capable component having complex circuitry and thus occupying a substantial amount of semiconductor area.
As designs for processing devices become more sophisticated, there is an ever increasing demand for more operations to be executed per unit time, as well as more capable execution units that can perform more complicated operations. In the effort to satisfy these demands, processing devices tend to be designed with ever higher numbers of parallel execution units that each contain significant circuitry. This leads to extremely large processing devices.
Even though semiconductor processing technology continues to improve to reduce circuit sizes and costs of manufacturing, such improvements occur at a relatively slow pace and may even be leveling off as advancements in processing technology approach the physical limits of materials used. Meanwhile the demand for faster, more capable devices that handle graphics related operations continues to grow. Thus, there is a significant need for techniques that allow a high number of operations to be performed while balancing the cost of semiconductor area consumption by parallel execution units.