Many different types of programming models exist in the area of digital signal processing. In general, these models differ by their characteristics, such as data types, data lengths, data functions, and the like. Instruction parallelism models are one type of model An instruction parallelism model is defined by its ability to simultaneously execute different instructions. Instruction parallelism models can be embodied by a very long instruction word ("VLIW") model or a super-scalar model, among others. VLIW models are advantageous in that they are very scalable, they are not affected by "memory wall" concerns, and they save both silicon area and power consumption by off loading the complex instruction scheduling schemes to a compiler.
VLIW models use a horizontal approach to parallelism where several scalar instructions are included in a long instruction word that is fetched from memory and executed by functional units in every cycle. More specifically, in each cycle, an instruction word specifies operations to be performed using specific data elements or operands. Exemplary operations may include mathematical operations, logical operations, and the like, depending upon the needs of a particular application. A variety of functional units, processing elements, or execution units perform the operations. More specifically, exemplary functional units may include multiply-accumulate ("MAC") units, load/store units, add units, and the like, and may vary from application to application. The data elements or operands are typically stored in register files.
Instructions from a VLIW model are executed by functional units in a digital signal processor ("DSP") . A scheduler may determine which functional units will execute the instructions. These instructions can be scheduled statically, that is, at compile time, as opposed to dynamically, that is, at run time. Because the instructions may be scheduled at the time of compiling under a VLIW model, a processor can simultaneously execute instructions while minimizing the occurrence of hazards.
VLIW architectures typically require processors to have a large number of buses and forwarding paths for delivering information among DSP elements, e.g., register files and functional units. This can be problematic in that it may increase processing time and power consumption. As such, there is a need for modifying the number of buses and the length of connection wires to deliver a faster access time in transporting information while reducing power consumption.
Some previous processors utilize a crossbar switch, i.e., a switch having a plurality of interconnected vertical and horizontal paths, for transferring information. However, these switches are very expensive and consume a considerable amount of power. Other previous architectures utilize a very tight forwarding and sharing scheme for transferring information such that a processor is essentially divided into parts, without permitting forwarding and sharing of information between the parts.