Historically, DSP functionality has taken two forms: software programmable processors with arithmetically oriented instruction sets such as those offered by TI, Analog Devices, Motorola, and Agere (Lucent), and dedicated logic hardware functionality specifically performing arithmetic tasks. In recent years, an alternative approach to programmable DSP functionality has arisen where arrays of arithmetically oriented function modules are connected by reprogrammable routing resources, in a manner similar to that utilized in Field Programmable Gate Arrays (FPGAs), creating reprogrammable array DSP solutions. Reprogrammable array DSP solutions are being offered by companies like PACT, Leopard Logic, and Elixent as embeddable cores and by Chameleon as a discrete component. A core is an embeddable block of semiconductor functionality that can be included in a System-On-Chip (SOC) ASIC (Application Specific Integrated Circuit) design. These reprogrammable array DSP solutions always operate independently of any classical software programmable DSP architecture.
Meanwhile a different evolution in processor architecture has occurred for RISC (Reduced Instruction Set Computer) processors where synthesizeable processor cores are being offered by companies like ARC and Tensilica with the ability to customize instruction set extensions. Variations on these processors are also offered with multiplier-accumulator functions added enabling DSP applications to be better addressed. However, these processor cores are only customizable at the time the logic function is synthesized—which means some time prior to the construction of actual silicon. Their instruction set cannot be altered or reconfigured once the silicon implementation has been fabricated.
At the same time, it has been shown by companies such as ARC and Tensilica that the ability to create customized instructions can greatly improve the performance of a processor. Unfortunately, since these instructions are not alterable in the field (once the processor has been delivered to the customer) they cannot adapt to the surprises that arise when real-world phenomena are encountered upon powering-up the first prototype. Such discrepancies are even more prevalent for DSPs since they often deal with real-world phenomena like voice and video, and noisy communications mediums like cable modems, DSL, and wireless where unpredictability is inherent.
A research project summary presented at the Instat/MDR Embedded Processor Forum (Apr. 29, 2002) by Francesco Lertora, a System Architect at ST Microelectronics, had some similarities to the present invention. It was entitled “A Customized Processor for Face Recognition” and demonstrated a custom processor based on Tensilica's Xtensa processor core. Here, they coupled the configurable (not field programmable) instruction extensions of the Tensilica processor to a block of FPGA technology on a custom SOC design. To augment the Tensilica processor, they implemented arithmetic functions in the FPGA to perform DSP-type functions. In this example, the FPGA functionality not only performs operations where results are returned to the RISC processor, it also performs some I/O functions directly, essentially functioning at times as a coprocessor.
While not combining a conventional DSP with an FPGA fabric in a tightly-coupled and dedicated manner with the FPGA subordinate to the conventional DSP as embodied in the present invention, this demonstration by ST does reveal some of the benefits of a processor with re-programmable instructions since it was able to considerably accelerate the required functionality. However, ST's chip designers gave in to the temptation to allow the FPGA to perform functions independently. In general, this adds a substantial amount of hardware dependence to the design flow, making it far more difficult for designers to use. DSP designers typically prefer to design in a high-level language like C and not have to deal with hardware dependencies. As soon as the FPGA is allowed to execute tasks in parallel with the conventional software programmable DSP, the overall DSP program must be partitioned into parallel tasks, a complex issue involving intimate knowledge of the hardware.
Another company that has discussed FPGA fabric performing instruction is GateChange. However, the proposed architecture includes an ARM (RISC) processor and also allows the FPGA fabric full co-processing capability, with complete access to the device's I/Os—certainly not constraining the FPGA fabric to be fully subordinate to the DSP as in the present invention.
FPGAs have been used for years to construct dedicated DSP functionality, sometimes in conjunction with a conventional DSP but operating as a separate functional element. In recent years, some FPGA suppliers like Xilinx and Altera have added dedicated multiplier functions. These essentially create a heterogeneous fabric where most of the modules are conventional Look-Up Table (LUT) based programmable modules, and some are fixed multiplier functions. This has made these devices more effective in terms of performance and density when arithmetic (DSP) functions are performed in dedicated hardware. These same FPGA suppliers now also offer RISC processors embedded in their FPGA devices. However, their FPGA functionality is not constrained to be subordinate to the processor—in fact their paradigm is just the opposite, with the processor acting as an enhancement to the FPGA function.
In order to reduce cost in volume production, FPGAs are often converted (migrated) to mask-programmed ASIC devices. It is well known that when this conversion is done, it is common for numerous testing and timing problems to arise. These problems can make the conversion process take a very long time and sometimes also result in poor testability in the ASIC. One of the key reasons for these problems is the use of asynchronous functionality in the FPGA. When FPGAs having integral processors are converted to ASICs, a common source of conversion difficulty is the fact that the FPGA functions are not synchronously tied to the processor function and the processor's clocks. If they were, the conversion task would be much simpler and timely—in fact it could be made fully automatic.
It is a generally accepted fact that for conventional, software programmable DSPs, less than 10% of the code often accounts for more than 90% of the execution cycles. It therefore follows that if a software programmable DSP were created with a field-configurable (field-programmable) instruction set, where dedicated functions with a high degree of parallelism can be applied to perform the functions consuming 90% of the cycles, the overall processor performance could be increased significantly.
However, a software programmable DSP with a field programmable instruction set does not exist. It appears that when reprogrammable array DSP solutions are developed, the creators are determined that this technology alone is the solution to the problem and it should be used as a separate functional entity from the conventional software programmable DSP. As offered, reprogrammable array DSP solutions are used for all DSP functions including the large quantity of instructions that normally occupy only 10% of the execution cycles. Unfortunately, this focus ignores the paradigm that exists for DSP development and the fact that DSP programmers—who are typically software engineers with an expertise in math—prefer to work in a software environment without having to be concerned with hardware uniqueness. Reprogrammable array DSP solutions do not fit cleanly into the flow that DSP programmers prefer to use. A software programmable DSP with a field programmable instruction set, on the other hand, would fit well—and increase processor performance significantly at the same time.
Part of the historical vision of programmable hardware, which the aforementioned reprogrammable array DSP solutions are embodiment's of, is that the reprogrammable fabric can remain programmable in production. The theory is that this allows adaptability to future changes in functional requirements, even sometimes enabling changes “on-the-fly”. Changes on-the-fly allow the personality of the logic to be altered from moment-to-moment as different algorithms are required for different tasks, sometimes altering the personality in as little as a clock or two. Unfortunately, the FPGA fabric used in these solutions consumes between 20 and 40 times as much silicon area as the standard-cell ASIC implementations normally used in SOC design. Further, if it is desirable to alter the function of the FPGA fabric on-the-fly and within a clock cycle or two, additional configuration memory must be included in the FPGA fabric to implement a “multi-program” capability, increasing the consumption of silicon area even more. Today, it remains to be seen if the value of full reprogrammability is economically viable for SOC-class designs, even more so the value of multi-program implementations.
Eventually, given the realities for very deep submicron design and the eventuality forecast by some that Moore's law (for semiconductor density and performance over time) may break down in the future, it is possible that fully-programmable multi-program FPGA fabrics may become viable for SOC volume production. However, in the meantime, there is a need for solutions that take advantage of flexibility benefits of FPGA technology, while also providing an effective and practical solution for volume production.