A conventional processor in an information handling system may include several pipeline stages to increase the effective throughput of the processor. For example, the processor may include a fetch stage that fetches instructions from memory, a decoder stage that decodes instructions into opcodes and operands, and an execution stage with various execution units that execute decoded instructions. Pipelining enables the processor to obtain greater efficiency by performing these processor operations in parallel. For example, the decoder stage may decode a fetched instruction while the fetch stage fetches the next instruction. Similarly, an execution unit in the execution stage may execute a decoded instruction while the decoder stage decodes another instruction.
The simplest processors processed instructions in program order, namely the order that the processor encounters instructions in a program. Processor designers increased processor efficiency by designing processors that execute instructions out-of-order. Designers found that a processor can process instructions out of program order provided the processed instruction does not depend on a result not yet available, such as a result from an earlier instruction. In other words, a processor can execute an instruction out-of-order provided that instruction does not exhibit a dependency.
To enable a processor to execute instructions out-of-order, the processor may include a “load/store queue.” With the load/store queue, load and store instructions are able to be executed in order relative to one another. Entries in the load/store queue may be established for load and store instructions in program order as the instructions are fetched. For example, as a new load or store instruction is fetched, an entry is created for that load or store instruction at the tail end of the load/store queue. The load/store queue continues to hold this instruction until it has been committed (i.e., irrevocable) or nullified through misspeculation. Hence, the load/store queue holds each load and store instruction that are currently in-flight until that particular load or store instruction has been committed or nullified through misspeculation.
As discussed above, the load and store instructions are stored in the load/store queue after they have been fetched. Once the load or store instruction is ready to be executed, there may be a search performed in the load/store queue to ensure that there would not be a violation in executing that load or store instruction. For example, if a load instruction is to be executed, a search may be performed in the load/store queue to locate older (referring to being fetched prior to the load instruction in question) store instructions to the same or overlapping address that have not been committed to determine if the correct data to be loaded has already been stored. In another example, if a store instruction is to be executed, a search may be performed in the load/store queue to locate any younger (referring to being fetched after the store instruction in question) load instructions to the same or overlapping address that have been executed. If that has occurred, all of the pipelines are flushed.
Since the load/store queue has to store all the load and store instructions from the time they have been fetched to the time they have been committed, the load/store queue has to be large in size to accommodate these load and store instructions. Hence, the load/store queue is not currently scalable by requiring to hold load and store instructions in program order from the time they have been fetched to the time they are committed.
However, if the load/store queue could be scaled smaller in size, then the area and power efficiency of the load/store queue may be improved.