1. Field of the Invention
The present invention relates to the field of computer systems. More specifically, the present invention relates to loading and storing misaligned data on an out-of-order execution computer system.
2. Background
Loading and storing misaligned data from and to a memory subsystem have been supported in prior art in-order execution computer systems. Data may be misaligned crossing two data chunks, or two cache lines, even two memory pages. The size of a data chunk, the size of a cache line, and the size of a memory page is architectural dependent. Additionally, the size of a cache line and the size of a memory page may be further dependent on the manner in which the computer system is configured.
Typically, hardware is provided to perform proper shifting or rotation for loading and storing data that cross two data chunks. Furthermore, hardware is provided to perform proper tracking and merging for loading and storing data that cross either two cache lines or two memory pages, with successive aligned subset loads and stores. Each of the successive aligned subset loads or stores involves data that are within the boundaries of a cache line. Data that are aligned within the boundaries of a cache line are automatically aligned within the boundaries of a memory page. A check is performed for each memory page for access permission, access mode, etc. Since instructions are executed in program order, there is no data synchronization problem.
However, in an out-of-order execution computer system, in order to maximize execution throughput, instructions are to be dispatched for execution as soon as their operand dependencies are resolved, without regard to program order or whether the instructions were speculatively or non-speculatively issued. Thus, it is desirable to be able to dispatch loads and stores to the memory subsystem as soon as their operand dependencies are resolved.
If data loads and stores are dispatched to the memory subsystem in such a manner, a dispatched data load or store in reality may or may not be ready to be executed by the memory subsystem due to incomplete predecessor data loads and/or stores. The reason is the memory subsystem typically requires a number of clock cycles to complete an actual data load or store and in the same period of time, multiple instructions could have dispatched. Additionally, while speculative data loads potentially can actually be executed by the memory subsystem, until the executed data loads become non-speculative, the speculatively loaded data must be "shielded" and not committed to a processor state, i.e. not making the data known and available to other processor components external to the out-of-order execution "core" such as a register file. On the other hand, speculative data stores can not be actually executed by the memory subsystem unless the destination memory locations are private to the processor and the memory subsystem has the ability to restore the overwritten data in the event the speculative data stores are purged. If either one of these conditions is not true, then the memory subsystem must actually execute the speculative data stores after they become non-speculative, also known to be ready to be committed to system state, i.e. making the data known and available to other system components external to the processor such as a coprocessor. In the meantime, to allow the out-of-order execution "core" to continue execution including subsequent speculative data loads, to the extent possible, the speculative data stores must be made to appear to have been executed to the out-of-order execution "core".
Thus, memory ordering interface circuitry is provided at either the "back end" of the out-of-order execution "core", in between the out-of-order execution "core" and the memory subsystem, the "front end" of the memory subsystem, or a combination thereof, to maintain memory order and thereby ensure data correctness. The order maintaining functions include at least the buffering of speculative and non-speculative data loads as well as non-speculative data stores until they can be actually executed by the memory subsystem, guaranteeing data correctness of speculatively executed data loads at the time of their commitment to processor states, and buffering speculative data stores until they become non-speculative.
Therefore, against this much more complex operating environment of an out-of-order execution computer system, loading and storing of misaligned data cannot be supported in the simplistic manner as the prior art in-order execution computer systems. Nevertheless, for compatibility reasons, it is still desirable if loading and storing of misaligned data can be supported, notwithstanding the much more complex operating environment. As will be disclosed, the method and apparatus of the present invention advantageously achieves the above discussed and other desired results.