1. Field of the Invention
This invention relates to a load store queue applied to, for example, a processor.
2. Description of the Related Art
One of the instruction execution technique in a processor is out-of-order execution. In the out-of-order execution, subsequent instructions independent of the preceding instructions are executed in no special order.
A load store queue is used to maintain the data-dependency via memory for load instructions and store instructions executed out of order in a processor that performs out-of-order execution.
Specifically, the load store queue grasps the order of memory access instructions and manages the order of memory access instructions issued out of order. As a result, passing can be avoided between memory accesses having dependency in the out-of-order execution.
FIG. 2 shows the relationship among a load store queue, a processor, and a data cache.
In FIG. 2, a load store queue 11 is provided between a processor (or an execution unit) 12 and a data cache 13. The load store queue 11 receives all the store requests and load requests issued out of order from the processor, writes store instructions that guarantee “in order” into the data cache 13, and returns load values that guarantee “in order” to the processor (or the execution unit) 12 via a multiplexer (MUX) 14. The load store queue 11 is composed of a table (not shown) for holding store instructions issued out of order and a mechanism for reading and selecting the load instructions issued out of order.
The operation of the load store queue, processor, and data cache shown in FIG. 2 will be explained briefly. The operation of the load store queue is divided into the following four processes:
1. (Acquiring In-order Information)
At the time when the processor 12 decodes a store instruction (before executing of order), the load store queue 11 receives decode information (21) about the store instruction in order from the processor 12, thereby securing in-order information about the store instruction.
2. (Processing a Store Request)
At the time when the store address (the address for a store instruction) and store value (the value of a store instruction) have been determined in the processor 12, the load store queue 11 receives the store request (22) from the processor 12 and holds the store address and value in the load store queue 11. The value is held during the time when the store instruction is in an out-of-order state, that is, during the time from when the store instruction is retired until “in order” is determined.
3. (Processing a Load Request)
The load store queue 11 receives the load address according to the load request (25) from the processor 12 and processes the load instruction. If the load instruction corresponds to the store value held in the load store queue 11 (or the store value preceding the load instruction whose “in order” has not been determined), the value of the store instruction that has the value of the load instruction is taken out from the values of the store instruction held in the load store queue 11. If the corresponding store instruction is not in the load store queue 11, the load value (26) is read from the data cache 13.
In the actual operation, the processor 12 issues the corresponding load request (25) to the data cache 13 and load store queue 11 at the same time. When the store value preceding the load instruction is present in the load store queue 11, the load store queue 11 supplies a hit signal (28) indicating the presence of the data to the multiplexer 14. When receiving the hit signal (28), the multiplexer 14 selects the load value (27) from the load store queue 11. When the preceding store value is not present and the hit signal (28) is not supplied, the multiplexer 14 selects the load value (26) from the data cache 13. In this way, the load value (29) selected by the multiplexer 14 is supplied to the processor 12.
4. (Retiring a Store Instruction)
When the store instruction has been determined and an in-order state has been determined, the processor (or the execution unit) 12 outputs retire information. The load store queue 11 receives the retire information (23) and writes the store instruction in the load store queue 11 back into the data cache 13 (24). Moreover, the entry in the load store queue 11 is deleted.
FIG. 3 shows the configuration of a general load store queue. This load store queue is configured so as to realize a queue that enables associative retrieval using tag information and addresses.
Specifically, the load store queue 11 of FIG. 3 comprises a table 30 that enables associative retrieval, a pointer 31 indicating the top of the queue, a pointer 32 indicating the bottom of the queue, a plurality of selectors 33, 34, 35, and 36, compactors 37, 38, 39, and 40, and a multiplexer 41.
In the above configuration, the aforementioned four operations will be explained.
1. (Acquiring In-order Information)
In this process, the load store queue 11 receives tag information about a store instruction (or a unique number in the processor given in order) as in-order information during decoding and secures entries in order in the queue.
Specifically, the processor outputs tag information about the store instruction (or store tag (51)) as in-order information during decoding. This store tag (51) is supplied to the selector 34.
The pointer 32 specifies the entry in the table 30 corresponding to the bottom of the queue (52). The selector 34 supplies the tag information to the entry specified by the pointer 32. The tag information is written in the specified entry. At the same time, a valid flag is set (53).
At this time, the entries for the address for and the value of the store instruction in the load store queue 11 remain empty. The entry indicated by the pointer 32 is incremented by, for example, “+1” with the input timing for the store tag (51), thereby being updated.
2. (Processing a Store Request)
In this process, the load store queue 11 receives tag information about the store instruction and the address for and the value of the store instruction and writes them into the secured entries (whose tag information coincides with the above tag information).
Specifically, the processor outputs tag information about the store instruction (or store tag (54)), the address for the store instruction (store address (55)), and its store value (56) as a store request. The store tag (54) is supplied to the comparator 37. The store address (55) and store value (56) are supplied to the selectors 35 and 36, respectively.
The comparator 37 retrieves the valid flag entry of the table 30 and extracts the valid tag (57). At the same time, the comparator 37 retrieves the entry coinciding with the store tag (54) from the tag entries in the table 30 (58). In this way, the comparator 37 retrieves the entries into which the store address (55) and store value (56) are to be written.
The selectors 35, 36 supply the store address (55) and store value (56) to the retrieved entries (59). In this way, the store address (55) and store value (56) are written into the entries secured in the table 30.
3. (Processing a Load Request)
In this process, the load store queue 11 receives tag information about the load instruction and the address for the load instruction and retrieves the store instruction (retrieved using addresses) whose address coincides with the received address from the preceding store instructions (retrieved using the tag information). If the corresponding store instruction exists, the load store queue 11 outputs a hit signal notifying the existence of the corresponding store instruction and further outputs the value of the store instruction whose address coincides with the received address as the load value (forwarding the value from store to load).
Specifically, the processor outputs tag information about the load instruction (or load tag (62)) and the address for the load instruction (or load address (63)) as a load request. The compactors 38, 39 receive the load tag (62) and load address (63).
The comparator 38 retrieves the valid flag entry in the table 30 and extracts the valid tag (64). At the same time, the comparator 38 retrieves the tag entry in the table 30, thereby retrieving the store instruction (65) preceding the load tag (62).
The comparator 39 retrieves the address entry in the table 30, thereby retrieving the entry for the store instruction coinciding with the load address (63) (66).
On the basis of the output signals of the comparators 38, 39, the comparator 40 checks to see if there is the entry for a store instruction which precedes a load instruction and whose address coincides with that of the load instruction. If the result of the checking has shown that such a store instruction exists, the comparator 40 outputs a hit signal (67). The multiplexer 41 selects the corresponding entry according to the output signal of the comparator 40 and outputs the value of the entry as a load value (68).
4. (Retiring a Store Instruction)
In this process, the load store queue 11 receives tag information about a store instruction to retire as retire information and deletes the entry for the corresponding store instruction from the queue.
Specifically, the processor outputs tag information about a store instruction to retire (or store tag 69) as retire information. The selector 33 receives the store tag (69).
The pointer 31 specifies the entry in the table to be deleted corresponding to the top of the queue (70). The selector 33 supplies a store tag (69) to the entry specified by the pointer 31. As a result, the store address for the specified entry in the table 30 and the value are outputted to the data cache (71). Then, the entry is deleted. The entry pointed at by the pointer 31 is incremented by, for example, +1 with the input timing for the store tag (69), thereby being updated.
When the store request is issued to the load store queue, the load store queue of FIG. 3 has to write the address (load address) for and the value (load value) of the store instruction into the corresponding entries in the table secured in order. To retrieve the entries into which the load address and the load value are written, it is necessary to determine whether all the tag entries in the table 30 coincide with the store tags and further determine the locations into which the data is to be written. Consequently, it takes a long time for retrieval, which makes high-speed processing difficult.
Furthermore, when a load request is issued to the load store queue, the load store queue has to retrieve not only a store instruction preceding a load instruction but also an instruction whose address coincides with that of the load instruction. Specifically, with the configuration of FIG. 3, after the entries of the preceding store instructions are read, it is necessary to retrieve the one whose address coincides with that of the load instruction. Consequently, in the load requesting process, too, it is difficult to speed up the process.
Moreover, when a store request or a load request is processed, it is necessary to retrieve all the entries of the table 30 on the basis of the store addresses and load addresses. As a result, when an attempt is made to construct a table with a large number of entries, this makes the processing speed slower and complicates the circuit configuration. Thus, it is difficult to configure the circuit. Accordingly, there has been a need for a load store queue which enables not only high-speed processing to be realized but also a table with large-scale entries to be configured with small-scale circuitry.