New application-focused system-on-chip platforms motivate new application-specific processors. Configurable and extensible processor architectures offer the efficiency of tuned logic solutions with the flexibility of standard high-level programming methodology. Automated extension of processor function units and the associated software environment (compilers, debuggers, simulators and real-time operating systems) satisfies these needs. At the same time, designing at the level of software and instruction set architecture significantly shortens the design cycle and reduces verification effort and risk.
U.S. Pat. No. 6,282,633, issued Aug. 28, 2001 and entitled, “High Data Density RISC Processor,” U.S. application Ser. No. 09/246,047, filed Feb. 5, 1999 and entitled “Automated Processor Generation System for Designing a Configurable Processor and Software,” U.S. application Ser. No. 09/322,735, filed May 28, 1999 and entitled “System for Adding Complex Instruction Extensions to a Microprocessor,” and U.S. application Ser. No. 09/506,502, filed Feb. 17, 2000 and entitled “Improved Automated Processor Generation System for Designing a Configurable Processor and Software,” all commonly owned by the present assignee and incorporated herein by reference, dramatically advanced the state of the art of microprocessor architecture and design.
More particularly, these previous patents and applications described in detail a high-performance RISC processor, as well as a system that is able to generate a customized version of such a high-performance RISC processor, based on user specifications (e.g. number of interrupts, width of processor interface, size of instruction/data cache, inclusion of MAC or multiplier) and implementation goals (e.g. target ASIC technology, speed, gate count, power dissipation, prioritization). The system generates a Register Transfer Level (RTL) representation of the processor, along with the software tools for the processor (compiler, linker, assembler, debugger, simulator, profiler, etc.), and the set of scripts to transform the RTL representation into a manufacturable geometric representation (usually referred to as synthesis and place and route). The system further includes recursive evaluation tools that allow for the addition of processor extensions to provide hardware support for commonly used functions in accordance with the application to achieve an ideal trade-off between software flexibility and hardware performance.
Generally, as shown in FIG. 1, the processor 102 generated by the system can include a configurable core 104 that is substantially the processor described in U.S. Pat. No. 6,282,633, and an optional set of application-specific processor extensions 106, which extensions may be described by Tensilica Instruction Extension (TIE) language instructions, and/or other high level hardware description language instructions, as detailed in the above-referenced applications. The processor and generation system of the above-referenced patents and applications are embodied in products that are commercially available from Tensilica, Inc. of Santa Clara, Calif.
Although the above system can generate processors that meet the requirements of many and various applications, there are other applications, such as video compression and decompression, data encryption, and signal processing, that can benefit from additional architectural and micro-architectural features. For example, the above system can generate processors capable of recognizing both 16- and 24-bit instructions. In some applications, it would be desirable to extend the amount of parallelism available in the instruction set architecture.
In addition, for some applications, it would be desirable for the system programmer not to have to re-order the location of data in memory. This often requires that the application be able to reference two separate streams from memory. However, the above system requires that the streams be re-ordered so that they could be accessed as a single stream of twice the number of bits. Further, the generated processors required that all local memories be accessed in a fraction of the processor cycle time. This limits the ability to use large memories in the system (as the time to access the memory is proportional to the size of the memory). Furthermore, the above system can not generate processors that can accommodate memories with variable latency. This is often useful when adding additional read or write ports to a memory (as would be required to support multiple load and store units).