The term unaligned memory access is generally used to refer to or to describe memory requests that require a memory, e.g., a cache memory, to return data that is not aligned to its read boundaries. For example, if a cache memory is aligned to word boundaries, e.g., 64-bit words, or the data path from a cache to the Load Store Queue (LSQ) is aligned along word boundaries from a cache line, a request for data that crosses this alignment is considered to be unaligned.
FIG. 1 illustrates a memory aligned to 64-bit word boundaries, in accordance with the convention art. For example, the first 64-bit word is aligned at address 0x000000. The second 64-bit word is aligned at address 0x000008. The third 64-bit word is aligned at address 0x000010.
A request made to address 0x000006 for 32 bits of data, will generally produce 16 bits of data from the entry addressed 0x000008 and the upper 16 bits of data from the entry addressed 0x000010. Such an unaligned access generally requires two memory accesses to fulfill one load request. It is to be appreciated that unaligned memory accesses generally decrease processor performance.
An additional problem with unaligned memory accesses occurs when a data bypass is required in a Load Store Queue (LSQ). When a load instruction (LD) is encountered, the cache is accessed and space is allocated in the Load Store Queue (LSQ) to install the data returned by cache. The load instruction resides in the Load Store Queue (LSQ) until the point at which the data that was requested is consumed.
This data may come from a cache, or it may be allowed to bypass from a store instruction (SD) which writes to the same address. The stores follow a similar path to cache where they are first logged into the Load Store Queue (LSQ) and then moved to the cache at instruction retirement. A store instruction that is older than a load instruction may bypass data to that load instruction, provided that the addresses match.
If one of these memory access instructions is unaligned, it is generally necessary to compare not only the aligned component but also the address to the next, or sequential aligned address, in order to determine a match. If only one instruction is unaligned, three addresses need to be compared. For example, one address for the aligned instruction and two addresses for the unaligned instruction must be compared. If both the instructions are unaligned, as many as four addresses may need to be compared, e.g., two addresses for the load instruction compared with each of the two addresses for the store instruction.
Conventional art approaches to mitigate such problems have included letting unaligned stores retire to cache before forwarding, generating exceptions to let software deal with the misalignment, and storing all possible addresses for each instruction. Unfortunately, such conventional approaches are prohibitively expensive and undesirable, in consideration of both degraded performance and deleteriously increased integrated circuit area. In addition, storing all the addresses for unaligned instructions generally requires two entries for each load/store (LD/SD) instruction pair in the Load Store Queue. A need for storing such addresses limits how many loads or stores can be in flight at the same time.