The invention relates to a processing device, which includes
a processing unit,
a memory for storing instructions for the processing unit, and
a read unit for reading instructions from the memory in a logic sequence and for applying the instructions to the processing unit so as to be executed in the logic sequence.
A processing device of this kind is known from International Patent Application No. WO 93/14457. During the execution of a program successive instructions are loaded from the memory into the processing unit so as to be executed. Contemporary processing units, however, are capable of executing the instructions faster, generally speaking, than the instructions can be read from the memory. Therefore, if no special steps are taken, the memory is a restrictive factor in respect of the overall speed of the processing device. This problem is called the xe2x80x9cmemory bottleneckxe2x80x9d.
A number of steps for circumventing the memory bottleneck are known from prior art. For example, xe2x80x9ccachingxe2x80x9d techniques are known. Caching utilizes a fast cache memory which saves instructions which are anticipated to be executed by the processing unit according to a cache strategy. The cache memory is comparatively expensive, because it must be sufficiently fast to read an instruction per instruction cycle of the processing unit. Furthermore, caching techniques are generally very complex and hence require a substantial circuit overhead.
From prior art it is also known to make the memory much wider than necessary for reading a single instruction. This means that a plurality of successive instructions can be simultaneously read in parallel in one read cycle of the memory. These instructions are stored in a prefetch buffer which can be very rapidly read, after which they are successively applied to the processing unit. While the processing unit executes the plurality of instructions from the prefetch buffer, subsequently a new memory read cycle is started for a next plurality of instructions. When N instructions are simultaneously read from the memory, in optimum circumstances the effective speed of the memory is thus increased by a factor N so that the memory need no longer be the restrictive factor in respect of speed of the processing device. This technique offers optimum results only if the processing unit executes instructions in a xe2x80x9clogicxe2x80x9d sequence (which means a sequence which is defined by the read unit without being readjusted by the processing unit). This is normally the case. However, the instructions executed by the processing unit may also include branch instructions which give rise to a different instruction execution sequence. Due to a branch instruction, therefore, a part of the content of the prefetch buffer (after an outgoing branch or before an incoming branch) is then useless. The already started reading of the memory is then also useless. This again limits the speed of the processor.
It is inter alia an object of the invention to provide a processing device in which the memory bottleneck is removed while using less overhead.
The processing device according to the invention is characterized in that the memory comprises a plurality of independently addressable memory banks, logically successive instructions being stored in different memory banks, and that the read unit is arranged to read a number of instructions from different memory banks in parallel and to replenish this number of instructions, each time that the processing unit starts to execute an instruction, by starting to read an instruction which logically succeeds the instructions being read in parallel from the memory banks at that instant.
Each successive instruction is thus stored in a next memory bank. To this end, the processing unit preferably has exclusively instructions of fixed length. As a consequence of the invention, the instructions are searched in a pipelined fashion and, normally speaking, a number of instructions will be in different stages of reading. Consequently, the instructions can be successively applied to the processor via the same bus and with intermediate time intervals which are shorter than the time interval required to read one memory bank. Because one new instruction is addressed for each instruction being executed, the number of instructions addressed is not larger than strictly necessary. This reduces the risk of memory conflicts which occurs if more than one instruction would have to be read simultaneously from the same memory bank.
The processing unit of an embodiment of the processing device according to the invention is capable of no more than starting successive execution of a number N of instructions in a time interval at least required between the addressing of a memory bank and the application of an instruction then read to the processing unit, the read unit being arranged to read N instructions in parallel from the various memory banks. Thus, exactly so many instructions can be read in parallel that the processing unit is not slowed down by the memory. As a result, the number of memory banks still engaged in reading, and hence not available to new addressing, is minimized. Therefore, there are preferably at least N memory banks.
The processing unit in an embodiment of the processing device according to the invention is arranged to execute inter alia a branch instruction, after which the processing unit should continue with the execution of either a branch target instruction or an instruction which logically succeeds the branch instruction, depending on the satisfying of a condition to be evaluated, the read unit being arranged to buffer instructions, in a pipeline unit, between the reading from the memory and the application to the processing unit, in order to detect the branch instruction in the pipeline unit, and arranged to start reading, in response to the detection of the branch instruction and in parallel with the reading of one or more instructions which logically succeed the branch instruction, the branch target instruction from a further memory bank, provided that the further memory bank does not store any of the one or more instructions which logically succeed the branch instruction, and also arranged to apply, after execution of the branch instruction and in dependence on the satisfying of the condition, either the branch target instructions and instructions logically succeeding it or the instruction which logically succeeds the branch instruction and instructions which logically succeed it to the processing unit. Thus, slowing down is also prevented in the case of a branch instruction. From WO 93/14457 it is known per se to prefetch also as from the branch target, but not from different memory banks which each store one instruction from a series of logically successive instructions, nor is pipelined reading of the memory banks performed. Due to the pipelined reading of the memory banks and the buffering in the pipeline unit, the number of active memory banks is minimized, so that the risk of memory bank conflicts between the reading of the branch target instruction and instructions succeeding the branch instruction is minimized.
The read unit in an embodiment of the processing device according to the invention is arranged to address, in parallel with the addressing of a branch target instruction location, an instruction location of at least one instruction which logically succeeds the branch target instruction and to load said at least one instruction into an internal stage or respective internal stages of the pipeline unit if the condition is satisfied.
The processing unit in a further embodiment of the processing device according to the invention is capable of no more than starting successive execution of a number N of instructions in a time interval at least required between the addressing of a memory bank and the application of an instruction read in response thereto to the processing unit, the read unit being arranged to address, in parallel with the addressing of the branch target instruction, instruction locations of Nxe2x88x921 instructions which logically succeed the branch target instruction, and arranged to load the Nxe2x88x921 instructions into the pipeline unit in parallel if the condition is satisfied. The pipeline unit can thus be loaded as quickly as possible again with instructions which logically succeed the instruction being executed in the processing unit, and the reading of the successive instructions is also completed as quickly as possible, so that the memory banks become available again for the reading of other instructions. In order to prevent read conflicts, the memory preferably comprises at least 2*N memory banks. As the number of memory banks is larger, there is less risk of conflicts where the branch target instruction or instructions logically succeeding it have to be read from the same memory bank as the instructions which logically succeed the branch instruction.
The read unit in an embodiment of the processing device according to the invention is arranged to start reading in parallel from different memory banks, after detection of the conditional branch instruction and until the processing unit has executed the conditional branch instruction, an instruction which logically succeeds the branch instruction and an instruction which logically succeeds the branch target instruction, each time that the processing unit starts the execution of an instruction. After the start of execution of the branch target instruction, instructions will thus be available immediately, without delays, each time that the execution of a new instruction is started, the number of active memory banks then being a minimum.
The read unit in an embodiment of the processing device according to the invention is arranged to detect whether the branch target instruction location is situated in the same memory bank as an instruction location of an instruction which logically succeeds the branch instruction and arranged to address, in the case of coincidence and on the basis of supplementary information concerning the branch instruction, either the branch target instruction location and the instruction locations of the instructions which logically succeed the branch target location or the instruction location of the instruction which logically succeeds the branch instruction and the instruction locations of the instructions which logically succeed it. The information indicates how likely it is that the condition will be satisfied. This information can be generated, for example during the compiling of the program or be generated on the basis of recent branch statistics during the execution of the instructions. The information is used to ensure that the instruction which is to be executed most likely is indeed addressed in the case of coincidence.
An embodiment of the processing device according to the invention is programmed with a program in which the branch target instruction and M instructions which logically succeed the branch target instruction are stored in memory banks other than the instruction which logically succeeds the branch instruction and M instructions logically succeeding this branch instruction. These instructions are thus read as fast as possible so that the memory banks become available again as fast as possible for other read operations (for example, for the benefit of a branch instruction). When a pipeline unit is used, the instructions thus read can be loaded into the pipeline unit directly in parallel.
The processing device is preferably integrated, together with the memory banks, on a semiconductor substrate. A large number of connection lines for loading the instructions can thus be readily implemented between the memory banks and the processing unit.
The invention also relates to a compiler which is arranged to generate instructions to be stored in the instruction memory, including a branch instruction, and arranged to adapt the arrangement of instructions in the instruction memory in such a manner that a branch target instruction and one or more instructions logically succeeding it are stored in memory banks other than an equal number of instructions succeeding the branch instruction. Bank conflicts can be prevented by taking the bank structure into account already during the compilation, notably the number of banks in the memory. In the processing device according to the invention it can be predicted exactly which instruction will be read when. Conflicts between banks are avoided by suitable arrangement of the instructions which are to be read simultaneously. The arrangement can be realized, for example by padding: the inclusion of xe2x80x9cno-operationxe2x80x9d instructions, or the storage of a branch target instruction in a suitable number of non-used locations after an unconditional branch instruction, or the branching over non-used instruction locations.