(1) Field of the Invention
This invention relates to a multistream instruction processor for issuing instructions from multiple instruction streams to multiple functional units in parallel.
(2) Description of the Related Art
Conventionally, a multistream processor, called multithread processor, has been employed to process multiple instruction streams in parallel, which is fully described in "A multithreaded Processor Architecture with Simultaneous Instruction Issuing." In Proc. of ISS'91: International Symposium on Supercomputing, Fukuoka, Japan, pp. 87-96, November 1991.
Construction of the multithread processor will be described hereunder with reference to FIG. 1.
As apparent from the figure, the processor is provided with an instruction cache 200; several instruction fetch unit 201 and decode unit 202 pairs; standby stations 203; instruction schedule units 204; functional units 205; and a register set 206.
The instructions cache 200 stores instruction steams; the instruction fetch unit 201 extract instructions from the cache 200; the decode unit 202 decodes the instructions extracted by the corresponding instruction fetch unit 201; the standby station 203 holds instructions until they are selected by the instruction schedule unit 204; the instruction schedule unit 204 schedules the decoded instructions; the functional unit 205 executes the instructions in accordance with the schedule; and the register set 206 stores data to be executed as well as holds execution result.
Operation of the processor will be described in detail. Instructions of different instruction streams are extracted by the fetch unit 201, and they are decoded by the decode unit 202 in parallel, the decode unit corresponding to the fetch unit. The decoded instruction is scheduled by the instruction schedule unit 204 and delivered to the functional unit 205 unless an instruction decoded by another decode unit 202 competes for the same functional unit. Otherwise, the decoded instruction remains in the standby station 203 until it is selected by the instruction schedule unit 204. Then, the instruction is executed by the functional unit 205 which manipulates the register set 206.
Before the multithread processor, a superscalar processor issues and executes multiple instructions. Different from the multithread processor the superscalar processor comprises a single fetch unit and decode unit pair; therefore, it executes instructions from a single instruction stream in parallel. Also, frequent interlock of pipeline occurs since the instructions from a single stream are dependent on each other. Consequently, the multithread processor has been developed to execute instructions from multiple streams as well as to reduce interlocks. That is, in architecture of the multithread processor, an instruction from one thread is issued simultaneously with instructions from other threads. Instructions from multiple threads are independent from each other, and this improves throughput of the processor by improving efficiency of the functional units.
However, the conventional multithread processor does not overcome interlocks of the processor caused by so called a cache miss, which deteriorates efficiency of the processor.
For example, it is supposed that a cache miss is detected in execution of a LOAD instruction. At cache access where data is loaded from a main memory (not illustrated in the figure) into the register set 206 in accordance with the Load instruction, the functional unit (a load/store unit in this case) will be interlocked as the equivalent data is not found in the cache 200 (cache miss); and will remain interlocked until data in the cache 200 is updated. Subsequently, the decode unit assigned to the same instruction stream will freeze; and load instructions from other instruction streams cannot be issued thereto. Consequently, efficiency of the functional unit deteriorates, and finally the processor will freeze itself.