Today, digital systems in a variety of applications including both Digital Signal Processing (DSP hereafter) and graphics accelerators, require the performance of many complex algorithms. These algorithms often use a wide cross section of specialized non-additive operations and non-linear functions to achieve their desired results.
These algorithmic requirements place significant strains on how data is processed in these application systems. On one hand, the more arithmetic resources processing the data, the greater the throughput. On the other hand, the more resources there are to control, the wider the instruction controlling these units needs to be, to provide the flexibility to optimally use these resources.
The wider the instruction word, the greater the systems overhead in operating the data processing resources. The system overhead may include, but is not limited to, the interfacing to external memories, the external memories, the instruction cache, and the general layout issue of routing many wires carrying these instruction signals to where they are needed. All of these are significant problems, often greatly increasing the cost of production, operational heat generation, and the general feasibility of such solutions.
Mechanisms and methods are needed to operate multiple data processing resources based upon a narrow instruction which can generate a wide instruction where needed. These methods and mechanisms need to minimize the routing and other overhead associated with moving wide instructions every cycle.