The present invention is related to a method and an arrangement for prefetching and aligning an instruction stream provided by a memory unit. Modern microprocessors have the ability of executing multiple instructions in parallel. Such microprocessors usually have a pipelined structure and comprise multiple execution units to execute instructions in parallel. For example, a microprocessor might have a load and store execution unit for performing load and store instructions and an arithmetic logic unit for executing data manipulating instructions. Furthermore, a 32-bit microprocessor might be able to execute instructions with variable lengths, for example, 16-bit instructions and 32-bit instructions.
To provide such a pipelined structure with the respective instructions from memory, usually a request is made to the memory unit. The memory unit has to load the respective number of instructions from the memory and provide the fetch unit with those instructions. As memory systems are usually slow compared to execution units, such an arrangement forms a bottleneck in the execution of instructions. Especially when it comes to a so-called boundary crossing, memory systems can not retrieve the requested data/instructions within one single access. A memory system is usually organized in lines and columns. Only a single line can be accessed at a time. Therefore, if the start and end addresses of a requested instruction stream lie not within a single line, the memory system will retrieve the requested instructions partly from one memory line and partly from the following memory line. Therefore, the memory system needs additional cycles until all information is retrieved.