The present invention relates generally to digital signal processors. In particular, the present invention relates to a programming model for a high performance digital signal processor which has multiple reconfigurable functional units for executing instructions.
Many different types of programming models exist in the area of digital signal processing. In general, these models differ by their characteristics, such as data types, data lengths, data functions and the like. Instruction parallelism models are one type of model. An instruction parallelism model is defined by its ability to simultaneously execute different instructions. Instruction parallelism models can be embodied by a very long instruction word (xe2x80x9cVLIWxe2x80x9d) model or a super-scalar model, among others. VLIW models use a horizontal approach to parallelism where several scalar instructions are included in a long instruction word that is fetched and executed every cycle. More specifically, in each cycle, an instruction word specifies operations to be performed using specific operands. Exemplary operations may include mathematical operations, logical operations, and the like, depending upon the needs of a particular application. Functional units which perform the operations may include any type of processing elements, such as, for example, execution units. More specifically, exemplary functional units may include multiply-accumulate (xe2x80x9cMACxe2x80x9d) units, load/store units, add units, etc. and may vary from application to application.
Instructions are processed by a scheduler which determines which functional units should be used for executing each instruction. Scheduling may be done statically, i.e., at compile time, as opposed to dynamically, i.e., at run time. Thus, VLIW models can simultaneously execute instructions while minimizing the occurrence of hazards. Because of this feature, among others, instruction parallelism models are very efficient in telecommunications applications.
Developing an instruction set architecture based on a VLIW model has several advantages. First, VLIW models are very scalable, both upward and downward. Scalability refers to the number of operations that can be packed into one long instruction word. The scalability enables the model to serve as a basis for a family of derivative implementations for various high performance digital signal processor (xe2x80x9cDSPxe2x80x9d) and multimedia applications. Second, xe2x80x9cmemory wallsxe2x80x9d are not an issue in the VLIW model. Memory walls refer to the concept that processor speeds are increasing at a rate more quickly than memory speeds. In the case of a VLIW model, memory walls are not a concern because the processor is simultaneously executing a large number of instructions instead of executing one complex instruction in a consecutive order where a processor would have to repeatedly wait for information from memory for every consecutive instruction. Third, the VLIW model saves silicon area and power by off loading the complex instruction scheduling scheme to the compiler.
Data parallelism models are a second type of model. A data parallelism model, also known as a vector model, is defined by its ability to simultaneously execute multiple operations of a single instruction, where each operation can be performed with different data. A data parallelism model uses vector instructions specifying vector operations and is embodied in a single instruction multiple data (xe2x80x9cSIMDxe2x80x9d) model. Data parallelism models are very efficient in block based applications such as image processing, Motion Pictures Experts Group (xe2x80x9cMPEGxe2x80x9d) systems, Finite Duration Impulse Response (xe2x80x9cFIRxe2x80x9d) systems, video conferencing, filtering applications, and multimedia applications.
As noted above, the instruction parallelism models and the data parallelism models are efficient for different types of applications. It would be extremely advantageous to develop a programming model with an instruction set architecture (xe2x80x9cISAxe2x80x9d) which incorporates the advantages of both the instruction parallelism and data parallelism models on which many different types of applications may run. More specifically, it would be advantageous to develop an instruction set architecture which permits the incorporation of a vertical programming model, such as a SIMD model, into a horizontal programming model, such as a VLIW model.
Many current programming models, regardless of whether they are horizontal or vertical models, do not provide for efficient code density. That is, current programming models use more memory than necessary in enabling the performance of certain functions. Programming models typically specify fixed-length instruction sets. For example, current standard reduced instruction set computer (xe2x80x9cRISCxe2x80x9d) processes normally employ fixed-length 32-bit instruction sets. A problem arises when a function can be performed with instructions having fewer than the fixed number of bits. In this case, the function must be carried out according to the programming model specifications and additional memory is used that is not necessary to perform the function. For example, assume a programming model provides for a 32-bit instruction set and a programmer desires to retrieve two numbers from memory and add them. Assume further that because of the size of the numbers and the operation, this function can be carried out using 16-bit instructions. Because of the specifications of the programming model, this function uses 3 operationsxc3x9732 bits or 96 bits. However, to perform the function outside the limitations of the programming model, this function only requires 3 operationsxc3x9716 bits or 48 bits. Thus, current programming models unnecessarily require the use of additional memory and do not take advantage of code density. In addition to having better code density, a DSP which facilitates using only 48 bits for the above example would also use less power in executing the operation due to the reduction in the power associated with fetching a smaller number of instructions (and hence smaller number of bits).
Some programming models include 16-bit instruction sets which are expanded into 32-bit instruction sets when executed. A DSP which takes a 16-bit instruction from memory and dynamically expands it to a 32-bit instruction at run time before sending it to the execution unit has equivalent code density to our scheme, however it suffers from two disadvantages. First, it has to spend extra decode time in order to expand the fetched 16-bit instruction into a native 32-bit instruction. This typically would cost an extra pipeline stage which has several adverse performance implications. Second, this dynamic expansion at run time typically would consume significant power.
Other programming models that currently adopt a variable length instruction set do so at the cost of certain disadvantages. Specifically, such programming models include hardware decoder logic that is extremely complex, thus requiring a longer decoding time.
Accordingly, the present invention overcomes problems in the prior art by providing an instruction set architecture for a digital signal processor that has improved code density, improved instruction level parallelism and improved issue bandwidth. The instruction set architecture includes information packets which may include instructions having different characteristics, such as instruction type (for example, scalar or vector) and instruction length (for example, 16-bit and 32-bit). These instructions are received by a scheduler or scoreboard unit which then determines the functional units that are available for executing the instructions. The instructions are then broadcast to a plurality of function units or processing elements for execution.