A VLIW processor which concurrently executes a VLIW (very long instruction word) including a plurality of instruction words (e.g., operation, load) has been proposed. The VLIW processor analyzes order relation and data dependency relation between instructions, and extracts instructions which are possible to be simultaneously executed. Thus, performance improvement due to simultaneous execution of instructions has been achieved without the runtime overhead.
However, in one program (instruction stream), the number of instructions which can be concurrently executed has a limitation; averagely 2 or 3 instructions per cycle is considered to be the limitation; and further performance improvement has been difficult.
Thus, in recent years, a VLIW processor which achieves further performance improvement by executing concurrently a plurality of instruction streams has been realized (see PL 1, for example). The processor described in PL 1, as shown in FIG. 2, requires instruction caches 1 to M, each of which stores an instruction for each instruction stream, to concurrently execute M instruction streams (instruction addresses 1 to M), instruction buffers 1 to M which temporarily store fetched instructions, and an instruction selector which extracts and selects instructions to be concurrently executed from the instruction streams. In addition, a program counter (PC) which controls an instruction sequence for each instruction stream will be required (not illustrated in FIG. 2). The instruction addresses 1 to M are provided from the PCs. An explanation will be given on the case in which the processor executes M instruction streams for executing the very long instruction word containing up to K instructions.
In such a case, according to PL 1, when the instructions fetched by the respective instruction streams are dividable, the M instructions are divided, and instructions are selected from the instruction streams and provided for computing units so that the priority of the instruction streams and the number of the simultaneously executable computing units may become the maximum (i.e., K). Therefore, the number of instructions concurrently executed are increased to improve the Performance.
A processor described in NPL 1 requires, as shown in FIG. 3, M program counters (PCs) which provide respective instruction streams (instruction addresses 1 to M), an instruction cache, and an address selector. It should be noted that the program counters are not illustrated in FIG. 3.
The program counters control the instruction sequence. One instruction cache stores M instruction-stream instructions. The address selector, based on instruction stream control information, selects an address designated by the M PCs, and supplies the address to the instruction cache.
According to the processor, in one instruction stream, if a stall is occurred owing to cache failure, the address selector selects and executes an instruction address designated by a PC corresponding to a different instruction stream, to minimize the performance degradation due to the stall.