One of the approaches to improving microprocessor performance is instruction level parallel processing. Instruction level parallel processing involves execution in parallel of low level machine operations, such as memory loads and stores, integer additions and floating point multiplications. Processors for implementing instruction level parallelism typically include multiple execution units and are controlled by very long instruction words (VLIW's). Each VLIW specifies the operations that are to be executed in a single cycle and includes multiple operation fields. The source program is typically written in a high level language without attention to operations that can be performed in parallel. The conversion of a source program to machine code which utilizes instruction level parallelism involves scheduling of operations which can be executed in parallel. The scheduling function may be performed by a compiler or by the processor itself. When scheduling is performed by the processor, the processor hardware may become complex. When scheduling is performed by the compiler, the processor simply executes the operations contained in the VLIW. Instruction level parallel processing is described by J. A. Fisher et al in Science, Vol. 253, Sep. 13, 1991, pp. 1233-1241 and by B. Ramakrishna et al in the Journal of Supercomputing, Vol. 7, 1993, pp. 9-50.
For maximum utilization of a processor having multiple execution units, each execution unit should perform an operation on every processor cycle. The execution units of the processor may be fully utilized during computation-intensive portions of a program. In this case, all or nearly all of the operation fields of the VLIW are filled. Other portions of the program may not require all of the resources of the processor. In this case, some of the execution units are idle, and one or more operation fields of the VLIW are not filled. The number of unfilled operation fields in a program may be significant. Storing instruction words with significant numbers of unfilled operation fields in memory is wasteful of valuable memory space. To avoid inefficient use of memory techniques, for storing wide instruction words in compressed format have been proposed.
In one prior art approach, compressed instruction words are stored with a mask word. The operation fields of the instruction word are stored in consecutive memory locations, or words. The mask word encodes where the memory words are inserted in the expanded instruction word. Since the mask word is normally only a few bits wide, two or more mask words can be grouped in the same memory word. This approach is illustrated in FIG. 1. An instruction word pair is stored in compressed format in memory as a mask word 20 followed in consecutive memory locations by operations W00, W02, W05, W06, and W07 of a first instruction word and operations W12 and W14 of a second instruction word. A mask field 22 in mask word 20 indicates the locations of the operations W00, W02, W05, W06 and W07 in a first line 34 of instruction cache 24, and mask field 26 indicates the positions of operations W12 and W14 in a second line 36 of instruction cache 24.
Due to the variable length of the compressed instruction format in memory, it is necessary to record the offset to the next instruction address somewhere in the instruction itself. The offset must also be stored in the instruction cache to be able to execute correct program counter sequencing and to maintain coherency between the program counter and the main memory code image. The offset to the next instruction address can be stored in mask word 20 as fields 30 and 32 and can be stored in instruction cache 24 as fields 38 and 40. An instruction compression and expansion technique similar to that shown in FIG. 1 and described above is disclosed in U.S. Pat. No. 5,057,837 issued Oct. 15, 1991 to Colwell et al. and U.S. Pat. No. 5,179,680 issued Jan. 12, 1993 to Colwell et al.
The major disadvantage of using the technique shown in FIG. 1 and described above is that consecutive instructions do not correspond to consecutive instruction cache locations, as they are separated by an address difference that depends on the variable length of the instruction. This introduces an artificial alias for instructions that are physically separated by a distance that is larger than the instruction cache size. For example, in a 1024 line instruction cache, a code section of 1024 instructions will very likely contain aliases to the same cache locations, unless proper padding is performed by the loader. This padding is possible only if empty spaces are left in main memory. In the example of FIG. 1, instruction pair #n occupies a cache hole left by the previous instructions. To achieve this, the assembler is forced to leave empty memory areas to get to the desired address of the cache hole. In the example of FIG. 1, twelve memory words are wasted to avoid a conflicting address for instruction pair #m.
In summary, the technique shown in FIG. 1 and described above has several disadvantages. The instruction cache must have a larger capacity to store the offset to the next instruction address. Program counter sequencing is complicated, because it needs to compute the next instruction addresses. The variable instruction length introduces artificial aliases in the instruction cache. If the loader pads instructions in main memory to avoid the problem of artificial aliases, holes are created in main memory.
In another prior art approach described by J. P. Hayes in Computer Architecture and Organization, McGraw-Hill, 1978, pp. 309-314, a processor uses two levels of microprogram control. Each instruction fetched from main memory is interpreted by a microprogram stored in a control memory. Each microinstruction is interpreted by a nanoprogram stored in a second control memory. The nanoinstructions directly control the hardware.