1. Field of the Invention
The present invention relates to an instruction queue of a microprocessor, used to hold prefetched instructions.
2. Description of the Prior Art
FIG. 1 shows a microprocessor having an instruction queue according to a prior art.
Components of the microprocessor will be explained. A main memory 100 stores instructions and data. An instruction cache 200 temporarily stores some of the instructions stored in the main memory 100 and is accessible at a high speed. An instruction fetch unit 300 fetches an instruction from the main memory 100 or from the cache 200. An instruction decoder 400 decodes an instruction transferred from the fetch unit 300 into an executable instruction. An execution unit 500 executes the executable instruction sent from the decoder 400. A register file 600 stores data required for executing an instruction. A data cache 700 stores part of data stored in the main memory 100 and is accessible at a high speed.
Components of the execution unit 500 will be explained. A branch unit 510 executes a branch instruction. An ALU 520 executes an arithmetic instruction or a logic instruction. A shifter 530 executes a shift instruction. A load unit 540 executes a load instruction. A store unit 550 executes a store instruction. The execution unit 500 loads and stores data with respect to the register file 600 and the data cache 700.
An instruction queue 800 is arranged between the decoder 400 and the execution unit 500. The queue 800 serves as a buffer. Variable-length instructions involve different fetch times, and therefore, the decoder 400 sometimes unable to continuously supply executable instructions to the execution unit 500. Accordingly, the queue 800 functions to temporarily store and continuously supply executable instructions to the execution unit 500, to improve the performance of the microprocessor.
FIG. 2 shows an example of the queue 800 according to the prior art.
The instruction queue of FIG. 2 is designed to hold six instructions. This number is only an example and is optional in practice.
Components of the queue 800 will be explained. An instruction memory 810 stores instructions supplied by the decoder 400. A write decoder 820 specifies a write address in the instruction memory 810. A read decoder 840 specifies a read address in the instruction memory 810. A write controller 860 controls a write operation. A read controller 865 controls a read operation. A counter 870 provides the write decoder 820 with write address data. A counter 875 provides the read decoder 840 with read address data. An input buffer 880 holds an instruction from the decoder 400 and sends it to the instruction memory 810 in response to a write enable signal from the write controller 860. An output buffer 885 holds an instruction from the instruction memory 810 and sends it to the execution unit 500 in response to a read enable signal from the read controller 865. A validity memory 890 indicates the validity of each instruction stored in the instruction memory 810. A full-valid-state detector 1000 determines whether or not the instruction memory 810 is full of valid instructions. A full-invalid-state detector 1005 determines whether or not the instruction memory 810 has no valid instruction.
The counters 870 and 875 are initialized to the same value in response to a reset signal. At this time, the validity memory 890 is completely zeroed to indicate that the instruction memory 810 is empty.
A write operation in the initial state will be explained. The decoder 400 provides the queue 800 with a write request and an instruction to write. The write decoder 820 receives write address data from the counter 870 through a line 871 and specifies a write address in the instruction memory 810 through lines 821 to 826. The write controller 860 supplies a write enable signal to the input buffer 880 through a line 862. Then, the instruction is written into the instruction memory 810 at the specified address. At the same time, the write decoder 820 sends "1" to indicate the validness of the written instruction to a corresponding one of flip-flops 891 to 896 of the validity memory 890 through lines 831 to 836. The write controller 860 increments the counter 870 by one through a line 861.
Any instruction from the decoder 400 is written into the instruction memory 810 as long as the memory 810 has a vacancy. When the instruction memory 810 becomes full of valid instructions, the full-valid-state detector 1000 detects it and sends a write prohibition request to the write controller 860. Then, the write controller 860 provides the input buffer 880 with no write enable signal even if the decoder 400 provides an instruction and a write request.
If an instruction is read out of the instruction memory 810, the full-valid-state detector 1000 withdraws the write prohibition request. Then, the write controller 860 provides the input buffer 880 with a write enable signal whenever the decoder 400 sends an instruction request and an instruction to write.
A read operation will be explained. The execution unit 500 issues a read request. The read decoder 840 receives read address data from the counter 875 through a line 876 and specifies a read address in the instruction memory 810 through lines 841 to 846. The read controller 865 provides the output buffer 885 with a read enable signal through a line 867 so that an instruction is read out of the specified address of the instruction memory 810. At the same time, the read decoder 840 sends "0" to indicate the invalidness of the read address to a corresponding one of the flip-flops 891 to 896 of the validity memory 890 through lines 851 to 856 and OR gates 901 to 906. The read controller 865 increments the counter 875 by one through a line 866.
Any read request is met as long as the instruction memory 810 has valid instructions. When the instruction memory 810 becomes empty, the full-invalid-state detector 1005 detects it and provides the read controller 865 with a read prohibition request.
Upon receiving the read prohibition request, the read controller 865 provides the output buffer 885 with no read enable signal even if the execution unit 500 issues a read request. If a new instruction is written into the instruction memory 810 so that the memory 810 has at least one valid instruction, the full-invalid-state detector 1005 withdraws the read prohibition request. Consequently, the read controller 865 provides the output buffer 885 with the read enable signal whenever the execution unit 500 issues a read request.
If an exception or a branch instruction is effected, valid instructions stored in the instruction memory 810 will be unnecessary. In this case, a reset signal zeroes the validity memory 890.
As explained above, write and read operations with respect to the instruction memory 810 are carried out independently of each other. The read counter 875 follows the write counter 870, and therefore, instructions are read out of the instruction memory 810 in written order. If the instruction memory 810 is full of valid instructions, any write request is rejected, and if the memory 810 is empty, any read request is rejected.
To explain the problems of the prior art, the operating conditions of the microprocessor and queue 800 will be explained first.
The fetch unit 300 fetches hit instructions from the cache 200 at a rate of two instructions in two cycles. The fetch unit 300 fetches cache-missed instructions from the main memory 100 at a rate of two instructions in four cycles. The branch unit 510, load unit 540, and store unit 550 of the execution unit 500 need each two cycles to execute an instruction, and the ALU 520 and shifter 530 thereof need each a cycle to execute an instruction. Only after completely executing a given instruction, the execution unit 500 provides the queue 800 with a read request.
Write and read requests to the queue 800 are never simultaneously made. For example, a write request is made in the first half of a cycle and a read request in the second half thereof. When write and read requests continuously occur, they occur only alternately and never simultaneously.
If the fetch unit 300 fetches hit instructions from the cache 200 continuously, it will be able to provide the decoder 400 with an instruction every cycle. Then, the decoder 400 may provide the queue 800 with a write request every cycle. If instructions to be executed by the ALU 520 or shifter 530 are continuously supplied to the execution unit 500, the execution unit 500 will provide the queue 800 with a read request every cycle because the instructions are executed cycle by cycle.
If load and store instructions each needing two cycles to execute are continuously supplied to the execution unit 500, the execution unit 500 will intermittently provide the queue 800 with read requests. During this period, instructions transferred from the decoder 400 are stored in the queue 800.
If the cache 200 does not have an instruction requested by the fetch unit 300, the cache 200 must be refilled. Until the cache 200 is refilled with instructions, the fetch unit 300 is unable to supply instructions to the decoder 400. This causes an idling period of two in four cycles.
If a branch instruction comes, the fetch unit 300 must change an instruction fetching address accordingly. Then, the fetch unit 300 will miss the cache 200 and must access the main memory 100. During this operation, a read request from the execution unit 500 is rejected.
During a period between receiving a branch instruction by the queue 800 and executing the same by the execution unit 500, the queue 800 accumulates instructions sent from the decoder 400. There is a great probability of these instructions being not executed once the branch instruction is executed.
The fetching of these useless instructions deteriorates the CPI (clock cycles per instruction) and performance of the microprocessor.
As explained above, the prior art frequently misses the cache 200 when executing a branch instruction and must access the main memory 100 until the cache 200 is refilled with required instructions. This results in idling the execution unit 500 without instructions to execute.
Further, the prior art accumulates useless instructions in the queue 800 while passing the branch instruction from the decoder 400 to the execution unit 500 through the queue 800.
Due to these problems, the performance of the microprocessor of the prior art drops whenever a branch instruction occurs.