The present invention relates to a memory device with support for unaligned access. Such a memory may be incorporated in a microprocessor or microcontroller. In particular, modern microprocessors or microcontrollers provide the capability of loading and storing multiple words in parallel. Therefore, the memory unit is designed to input and output multiple words. For example, a memory unit has a 128 bit wide bus to read and write four 32-bit words in parallel. In particular, if a memory system is integrated with a microprocessor or microcontroller, for example as a cache sub-system, this allows extremely high data throughput.
FIG. 9 shows such an arrangement according to the prior art. The memory unit consists of four memory blocks 1, 2, 3, and 4. Each memory block 1, 2, 3, and 4 provides a 32-bit wide interface which is connected to an alignment unit 6. A select logic unit 5 is provided which receives an address from terminal from a central processing unit (not shown). If an address provided to terminal 7 has a start address within the memory unit which begins at memory block 1, an aligned access to the memory unit takes place. A 128-bit word consisting of the content of memory cells M1, M2, M3, and M4 will be fed to the aligner 6 which connects this output directly to terminal 8. In case of an unaligned access to the memory unit the following scenario takes place. If, for example, an address provided at terminal 7 starts within the memory unit at memory block 3, the 128-Bit word consists of the content of memory cells M3, M4, M5, and M6. Only the first two 32-bit words M3 and M4 can be accessed in a first cycle because the system can only access one memory line during one cycle. In other words, only memory line M1, M2, M3, and M4 or memory line M5, M6, M7, and M8 can be accessed during one cycle. In this example, the requested 128-bit word is distributed over two different memory lines. During a second cycle, the remaining two 32-bit words M5 and M6 will be retrieved from memory block 1 and 2 and merged in a register. Aligner 6 multiplexes the output of memory blocks 1, 2, 3, and 4 to output the aligned 128-bit word at terminal 8 in the correct order, namely M3, M4, M5, and M6.
A major disadvantage of this arrangement is the above-described "one cycle penalty" due to the structure of the memory unit in case of an unaligned access as well as a timing disadvantage. Time critical programming can therefore not support any unaligned memory access.