1. Field of the Invention
The present invention generally relates to processors, and, more particularly, to a parallel processor that executes a plurality of basic instructions in parallel.
2. Description of the Related Art
Generally, in a conventional computer system, a plurality of basic instructions are executed in parallel by pipeline processing, thereby improving its performance. Conventionally, a plurality of basic instructions constitute a fixed-length instruction word, and a very-long instruction word (VLIW) technique is employed as a method for executing a plurality of basic instructions contained in one instruction word in parallel. Also, a super scalar technique may be employed. In accordance with the super scalar technique, basic instructions are executed in parallel depending on the number of basic instructions contained in each instruction word.
FIG. 1 shows the structure of a conventional parallel processor 10. This parallel processor 10 comprises an instruction fetch unit 1 connected to a memory 7, an instruction issue unit 3 connected to the instruction fetch unit 1, instruction execution units EU0 to EUn each connected to the instruction issue unit 3, and a register unit 5 connected to each of the instruction execution units EU0 to EUn.
The instruction fetch unit 1 fetches an instruction word from the memory 7, and supplies the instruction word to the instruction issue unit 3. The instruction issue unit 3 issues the basic instructions contained in the supplied instruction word to the instruction execution units EU0 to EUn. If the instruction execution units EU0 to EUn are still executing previous basic instructions at this point, the instruction issue unit 3 waits for the end of the execution. When the execution ends, the instruction issue unit 3 supplies the basic instructions to the instruction execution units EU0 to EUn.
The instruction execution units EU0 to EUn execute the basic instructions, and notify the instruction issue unit 3 of the end of the execution. The register unit 5 supplies data to the instruction execution units EU0 to EUn, if necessary, and holds the execution results of the instruction execution units EU0 to EUn. The externally connected memory 7 stores an instruction word string to be executed in the parallel processor 10. The memory 7 also stores necessary data for the execution units EU0 to EUn to execute instructions, and data as the execution results.
FIG. 2 shows the formats of instruction words to be supplied to a parallel processor having four instruction execution units EU0 to EU3. As shown in FIG. 2, each instruction word is made up of a basic instruction EI and a do-nothing instruction NOP. If the number of basic instructions contained in one instruction word to be executed in parallel is smaller than the number of the instruction execution units EU0 to EU3, the proportion of do-nothing instructions is large.
In the conventional parallel processing method of executing a plurality of basic instructions by the VLIW technique, each instruction word has a fixed length. Therefore, if the number of basic instructions to be executed in parallel is smaller than a predetermined number, do-nothing instructions are added to comply with the predetermined length. Because of that, in a program having a small number of basic instructions in total, the proportion of do-nothing instructions is large, and the amount of instruction code increases accordingly, resulting in problems such as poor usage efficiency of memory, a decrease of the hit ratio of cache memory, and an increase of the load on the instruction fetch mechanism.
With the super scalar technique, there is also a problem that a large-scale circuit is needed to increase the number of instructions to be executed in parallel.