The present invention is related to a method and an arrangement for prefetching and aligning an instruction stream provided by a memory unit. Modern microprocessors have the ability of executing multiple instructions in parallel. Such microprocessors usually have a pipelined structure and comprise multiple execution units to execute instructions in parallel. For example, a microprocessor might have a load and store execution unit for performing load and store instructions and an arithmetic logic unit for executing data manipulating instructions. Furthermore, a 32-bit microprocessor might be able to execute instructions with variable lengths, for example, 16-bit instructions and 32-bit instructions.
To provide such a pipelined structure with the respective instructions from memory, usually a request is made to the memory unit. The memory unit has to load the respective number of instructions from the memory and provide the fetch unit with those instructions. As memory systems are usually slow compared to execution units, such an arrangement forms a bottleneck in the execution of instructions. Especially when it comes to a so-called boundary crossing memory systems can not retrieve the requested data/instructions within one single access. A memory system is usually organized in lines and columns. Only a single line can be accessed at a time. Therefore, if the start and end addresses of a requested instruction stream lie not within a single line, only part of the requested information can be retrieved. The rest of the information has to be provided within a second request.
It is therefore an object of the present invention to provide a method and apparatus for providing a plurality of aligned instructions from an instruction stream with a minimum of delay.
A data processing unit according to the present invention has a superscalar structure being able to execute a plurality of instructions in parallel, a memory for storing said instructions having a plurality of n-bit input/output ports, and a coupling unit for coupling said memory with a instruction fetch unit, a instruction stream request control unit for addressing said memory to provide an instruction stream at its output port. The coupling unit comprises a shifter having an input and an output and a control input, the input is coupled with the memory, and the control input being coupled with the instruction stream request control unit. The instruction fetch unit comprises a register for storing the instruction stream, a register control unit for dispatching the plurality of instructions from the register, and means for shifting the content of the register.
In another embodiment a data processing unit has a superscalar structure being able to execute a plurality of instructions in parallel, a memory for storing the instructions having a plurality of n-bit input/output ports, and a coupling unit for coupling the memory with an instruction fetch unit, a instruction stream request control unit for addressing the memory to provide an instruction stream at its output port. The coupling unit comprises a shifter having an input and an output and a control input, the input being coupled with the memory, and the control input being coupled with the instruction stream request control unit. The instruction fetch unit comprises a register for storing the instruction stream, a register control unit for dispatching the plurality of instructions from the register, and means for writing a partial content of the register into the shifter.
With an arrangement, such as described in the embodiments, it is possible to prevent many cases where a boundary crossing would result in stalling the pipelines with a minimum of additional hardware. The longer the prefetch buffers are designed the less the probability of a boundary crossing will occur.
A Method for providing a plurality of instructions from a memory having a plurality of n-bit input/output ports to a processing unit within a data processor having a superscalar structure, and being able to execute a plurality of instructions in parallel,comprises the steps of:
a) addressing said memory to output an instruction stream at its output ports; and
b) issuing a number of instructions from said instruction stream and buffering at least the instructions of said instruction stream which have not been issued.
Furthermore, a method for providing a plurality of instructions from a memory having a plurality of n-bit input/output ports to a processing unit within a data processor having a superscalar structure being able to execute a plurality of instructions in parallel is disclosed. The method comprises the steps of:
a) issuing a number of instructions from a previously stored instruction stream,
b) generating an address and a shift value depending on the previously issued instructions;
c) addressing the memory to output an instruction stream at its output port,
d) combining the instruction stream with the previously not issued instruction stream and aligning the instruction stream,
e) storing the combined instruction stream;
f) repeating steps a)-f).