Some modern digital signal processor (i.e., DSP) cores use very long instruction word (i.e., VLIW) architectures. Such architectures assume that instruction scheduling is done in software either by an assembly programmer or by a compiler. In the VLIW approaches, parallelism is statically encoded using variable length execution sets (i.e., VLES). In modern DSPs, each VLES can encode as many as 12 instructions. Furthermore, each VLES may include several prefix words added by the assembler. Each VLES also provides a high code density by using 16 or 32 bits for instructions.
Referring to FIG. 1, a diagram illustrating conventional order for fetching and dispatching several variable length execution sets is shown. Shading in the blocks identify instructions belonging to different sets. A new fetch set is read on each cycle (i.e., cycles 1-4) and subsequently dispatched (e.g., cycles 2-7).
A problem commonly exists in the VLES dispatch decoding time. During a single cycle, a dispatcher determines which instructions belong to a specific VLES being dispatched. In conventional implementations, the dispatcher works on each instruction in each fetch set in parallel to complete the dispatch decoding in a single cycle. Therefore, a large number of parallel decoders are normally implemented. In particular, eight decoders are provided for an 8-word fetch set case and 16 decoders are provided for a 16-word fetch set case. Implementing multiple parallel decoders utilizes a significant amount of logic, which increases chip area and power consumption.
The VLES dispatching is also a limiting issue of a DSP core frequency. The DSP core frequency is governed by the time used in the dispatch procedure to complete work on each VLES. Limiting the core frequency such that a current VLES is completed in a single cycle allows a next VLES pipeline to start in a next cycle.
It would be desirable to implement an efficient extraction of execution sets from fetch sets.