Data processors, including specialized data processors such as Digital Signal Processors (DSPs), are commonly used in devices such as cellular telephones, modems, set-top boxes, digital communications equipment in general, music and video equipment, voice and image recognition equipment, and many other systems. These devices may perform arithmetically intensive tasks and may be required to operate according to strict real-time constraints. The heart of any DSP is the execution unit. The execution unit of a DSP is often highly specialized, designed to perform the types of computation common in DSP applications. Nevertheless, no one data processor has yet met the needs of all or even most applications. When available data processors do not meet system requirements, the following alternatives are currently available:
1. Add dedicated hardware to the system in order to perform the required functions. This hardware-intensive solution is less flexible, and more difficult to maintain, than software solutions.
2. Modify the data processor to include additional execution units (e.g. multiply-accumulate units or Galois field multipliers). This solution is (a) less efficient (i.e. it may require a larger circuit area) for those applications that may not require the additional functions; and (b) a costly and time-consuming process that involves adding new functions, creating new instructions, and modifying other parts of the core processor (e.g. the core processor's instruction decoder).
3. Add a loosely-coupled co-processor, such as a member of Intel's x87 family of numeric co-processors, to assist in performing additional computation. Co-processors that are not tightly coupled with the core must receive a program and the co-processors need a “start” instruction in order to be activated. When co-processors are finished executing, the co-processors synchronize with the core by means such as an interrupt. The core processor and the co-processors may spend significant amounts of time idling while waiting for each others' synchronization signals.
4. Offer a processor that can be configured by ASIC developers. These processors have a fixed selection of hardware resources (such as ALU, multiplier, data paths, etc.) and instructions. To this baseline architecture, a number of new instructions can be added. The flexibility of these chips is limited in terms of the changes that can be made.
5. Add tightly coupled co-processors using a special field in the instruction coding. This field is added to the instructions of the core processor and is passed to the off-core units in order to control these off-core units. This approach has the disadvantages of: (a) limited, and fixed in advance, co-processor support; and (b) sacrifice of a large portion of the instruction coding space, which can increase overall code size even if the application has no need for the co-processor.
There is thus a widely recognized need for, and it would be highly advantageous to have, off-core execution units, or similarly, off-core logic units, which allow the execution unit of a processor to be customized without changes to the instruction set or the core processor itself. Additionally, it would be highly advantageous to have a very flexible solution that allows the user to tailor the core processor to the application without compromising code size, scalability, and overall system parameters.