This application claims priority to S.N. 98402456.2, filed in Europe on Oct. 6, 1998 (TI-27685EU) and S.N. 98402455.4, filed in Europe on Oct. 6, 1998 (TI-28433EU).
The present invention relates to processing engines, and to the parallel execution of instructions in such processing engines.
It is known to provide for parallel execution of instructions in microprocessors using multiple instruction execution units. Many different architectures are known to provide for such parallel execution. Providing parallel execution increases the overall processing speed. Typically, multiple instructions are provided in parallel in an instruction buffer and these are then decoded in parallel and are dispatched to the execution units. Microprocessors are general purpose processing engines which require high instruction throughputs in order to execute software running thereon, which can have a wide range of processing requirements depending on the particular software applications involved. Moreover, in order to support parallelism, complex operating systems have been necessary to control the scheduling of the instructions for parallel execution.
Many different types of processing engines are known, of which microprocessors are but one example. For example, Digital Signal Processors (DSPs) are widely used, in particular for specific applications. DSPs are typically configured to optimize the performance of the applications concerned and to achieve this they employ more specialized execution units and instruction sets.
The present invention is directed to improving the performance of processing engines such as for example, but not exclusively, digital signal processors.
Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims. Combinations of features from the dependent claims may be combined with features of the independent claims as appropriate and not merely as explicitly set out in the claims.
In accordance with a first aspect of the invention, there is provided a processing engine comprising an instruction buffer operable to buffer single and compound instructions pending execution thereof, and a decode mechanism configured to decode instructions from the instruction buffer. The decode mechanism is configured to be responsive to a predetermined tag in a tag field of an instruction, which predetermined tag is representative of the instruction being a compound instruction formed from separate programmed memory instructions. The decode mechanism is operable in response to the predetermined tag to decode at least a first data flow control for a first programmed instruction and a second data flow control for a second programmed instruction.
Thus, an embodiment of the invention provides a decode mechanism responsive to compound instructions formed (e.g., assembled or compiled) by combining separate programmed instructions. In this manner, it is possible to optimize the use of the bandwidth available within the processing engine. Appropriate programmed instructions, such as suitable memory instructions, can thus be assembled, or compiled, to form a compound instruction. By generating a separate control flow for each of the constituent programmed instructions from the compound instruction, those instructions can be performed wholly or partially in parallel with a positive effect on the overall throughput of the processing engine. The control flow generated by the decode mechanism for each of the programmed instructions can be the same as that which would have been generated for the programmed instructions if they had been held as single instructions in the instruction buffer.
A compact and efficient encoding can be enabled in an embodiment of the invention. For example by ensuring that a memory instruction can only be a first of a pair of instructions in the instruction buffer in the form of a predetermined compound instruction, parallelism of memory access instructions can be provided with efficient encoding, efficient use of real estate and reduced power consumption.
In an embodiment of the invention, the compound instruction is defined as a soft compound memory instruction formed by combining (e.g. using an instruction preprocessing mechanism such as a compiler or an assembler) from separate programmed memory instructions. In a particular example, the compound instruction is a soft dual memory instruction, that is a dual memory instruction assembled from separate first and second programmed memory instructions, although in other examples more than two instructions can be assembled into a compound instruction.
Preferably, the decode mechanism is operable to decode a first memory address for a first programmed memory address instruction and a second memory address for a second programmed memory instruction from a compound memory address field in the compound instruction. Particularly, where the compound address field of the compound instruction is at the same bit positions as the address field for a hard programmed dual memory instruction, this can have a positive effect on instruction throughput. In this case the decoding of the addresses can be started before the operation code of the instructions have been decoded regardless of the format of first and second instructions of a dual instruction.
In order to reduce the number of bits required for the compound instruction, the memory addresses in the compound address field of the compound instruction can be arranged to be indirect addresses, whereby the decode mechanism needs only to be operable to decode indirect addresses for such instructions. As dual instructions support less options than single instructions, the size of a post modification field for the addresses can be reduced, thereby reducing the number of bits required for the addresses themselves and also to dispense with an indirect/direct indicator bit.
A memory access instruction can be constrained to be a first instruction of a pair of instructions in the instruction buffer. In this case a soft dual instruction effectively provides an encoding corresponding to two memory instructions. As a result, the need for a parallel enable field can be avoided, any memory instruction being implicitly capable of parallelism. This also provides further advantages of providing a reduction of an application code size, with optimization of external interface bandwidth and a reduction of cache misses.
The decoder for the second instruction of an instruction pair can also be made as a subset of the decoder for the first instruction resulting in a reduction in the integrated circuit real estate required and a reduction in power consumption for the processing engine.
In order to provide a compact instruction format and to enable the address field to be located at the same position as for a hard compound instruction, the compound instruction can comprise a split operation code field for a first instruction of the predetermined compound instruction. The operation code can be spilt either side of the address field, for example. The decoder can be response to detection of the appropriate tag field to decode the split operation code for the first instruction of the compound instruction.
In order to further reduce the number of bits, the compound instruction can comprise a reduced operation code field for at least the first instruction of the predetermined compound instruction such that the operation code field comprises fewer bits that the operation code field of the first programmed instruction. By restricting the range of operation codes for memory instructions to be within a certain range or ranges, the number of bits which need to be provided for the first operation code can be reduced. The decode mechanism can be arranged to be responsive to the predetermined tag to decode a reduced size operation code for the first instruction of the compound instruction.
With the various measures mentioned above, the predetermined compound instruction can be arranged to have the same number of bits in total as the sum of the bits of the separate programmed instructions. Reorganization of the fields from the programmed instructions can lead to the predetermined compound instruction having a common overall format with other instructions.
Where each programmed instruction has a data address generation (DAGEN) code field, the individual DAGEN codes of the individual programmed instructions could be combined into a combined DAGEN code field within the compound instruction. This could provide more rapid decoding and execution of the compound instruction. The combined DAGEN code field could form part of a combined address field. Where a combined DAGEN code field is provided, the decode mechanism can be operable to respond to a predetermined DAGEN tag to decode the combined DAGEN field.
The processing engine can be provided with a data fetch controller operable to fetch, in parallel, first and second operands from addresses identified by the first and second memory addresses, respectively. A data write controller can also be operable to write in parallel the result of first and second data flow operations for the first and second instructions, respectively. Also, dual read/write operations can be provided.
In an embodiment of the invention, assembler syntax can differentiate between hard compound and soft compound syntax to provide visibility for available slots for parallelism. A hard compound instruction can be executed in parallel with a non-memory instruction such as a control flow or register instruction as indicated by a parallel enable bit and as long as there are no bus/operator resource conflicts.
In accordance with another aspect of the invention, there is provided a processor, for example, but not necessarily, a digital signal processor, comprising a processing engine as described above. The processor can be implemented as an integrated circuit, for example as an Application Specific Integrated Circuit (ASIC).
A digital signal processing system comprising a processing engine as described above can also be provided with an instruction preprocessing mechanism operable to combine separate programmed memory instructions to form a compound memory instruction. The instruction preprocessor can be in the form of a compiler, assembler, etc., which is operable to compile or assemble compound instructions from programmed instructions. The mechanism can be configured to be operable to determine whether the separate programmed memory instructions may be combined prior to assembly of the compound instruction.
In accordance with a further aspect of the invention, there is provided an instruction preprocessor for a digital signal processing system, the instruction preprocessor being configured to be operable:
to determine programmed memory instructions capable of being combined; and
to assemble a compound memory instruction from said determined programmed memory instructions.
It should be understood that in the present context the term xe2x80x9cinstruction preprocessorxe2x80x9d is to be understood broadly to cover any mechanism for preprocessing instructions, that is compiling and/or assembling instructions, including compilers, assemblers, etc.
The instruction preprocessor may be provided separately, for example on a carrier medium such as a data storage medium (a disc, solid state memory, a data transmission medium such as an electrical, optical or other electromagnetic (e.g. wireless transmission medium)).
In accordance with another aspect of the invention, there is provided a method of improving the performance of a processing engine. The method includes:
buffering a compound instruction assembled from separate programmed memory instructions, the compound instruction including a tag field containing a predetermined compound instruction tag; and
responding to the predetermined compound instruction tag in the tag field of an instruction in the instruction buffer to decode, from the compound instruction, at least first data flow control for a first programmed instruction and second data flow control for a second programmed instruction.