Processor instruction set requirements generally evolve with changing application domains. In particular, digital signal processing applications can be made to perform better with special instructions. However, the particular instructions needed to achieve such performance improvement typically change over time.
Current methods for accommodating the instruction set requirements of new application domains include either extending the instruction set, which leads to instruction set bloating and increasingly complex processor designs, or regeneration of a tailored processor from a processor core with sub-optimal (slow core) baseline performance. Current approaches to quick regeneration require the new instruction(s) to be simple enough to fit within the existing execution pipeline, and thereby limits the complexity of new instructions in terms of latency and resource usage. Accordingly, the current methods cannot accommodate the needs of new instructions that require a complex data path (more pipeline stages), an additional internal state (e.g., a private register file) or many-cycle execution latency with optionally pipelined operation.
There is, therefore, a need in the art for an architecture providing good performance for native instructions while supporting new single or multi-cycle instructions which may include complex data paths, an additional internal state, and optionally pipelined operation, with latencies known to the compiler and used for instruction scheduling.