VLIW (Very Long Instruction Word) machines use horizontally encoded wide instructions that issue an operation code to each of the functional units of the VLIW machine every machine clock cycle. The compiler that generates the machine level code for the VLIW machine sometimes cannot find enough instruction level parallelism (ILP) to keep all the functional units busy during a given clock cycle, therefore no-operation codes (NOPs) have to be issued to some functional units.
There are two classes of VLIW encodings to handle NOP's: uncompressed and compressed instructions. Uncompressed instructions explicitly encode the NOP's in the instruction word. Therefore the code density, the density of effective operational codes versus NOPs, can be sparse if the compiler compiling a program cannot identify a high level of ILP in the program. The instruction fetch mechanism for uncompressed instructions is simple because the instruction word length is a constant and the sequencing for the next program count address PC.sub.i+1 is just the present program count address PC.sub.i plus the constant word length, i.e. PC.sub.i+1 =PC.sub.i +constant.
The present invention addresses the more complex case presented by compressed instructions wherein the main memory contains compressed instructions and the instruction fetch mechanism must cope with compressed VLIW instructions which have variable length.
Compressed instructions do not explicitly encode all the NOP's in the instructions, in order to reduce the space used in the memory system for instruction storage. However, because the number of NOPs removed from an instruction varies, compressed instructions have a variable-length instruction format. Variable-length instructions have always posed problems for sequencing of the program counter because of the uncertainty of the length of the current instruction being fetched. In addition, compressed instructions need to be decompressed or expanded before the instruction can be executed in the functional units of the microprocessor.
In conventional variable length computers, such as the IBM 360 or Digital's VAX, the current instruction length is not known until the instruction is decoded. In the design of fast computers, the technique of pipe-lining is commonly used where instruction processing is broken up into pipe-stages such as fetch, decode, execution, and writeback. The fastest clock cycle, or operating frequency, at which the computer can run is limited by the latency of the slowest stage of the pipeline. The problem with variable-length instructions is that the fetch and decode of an instruction needs to be combined into a single pipe-stage since the program counter cannot point to the next instruction unless it knows the length of the current instruction. As a consequence, the maximum frequency of the microprocessor is reduced because the instruction fetch tends to be one of the critical paths.
Other conventional microprocessors, such as Philips' Trimedia, explicitly encode information describing the instruction in the instruction format. Trimedia actually encodes the length of the next instruction, instruction.sub.i+1, into the current instruction, instruction.sub.i. Therefore, the length of the current instruction being fetched is known because it was decoded from the previous instruction. The major problem with this solution is that encoding the length of the each instruction takes up bit space in the instruction formats and branch targets are required to be a defined length instruction since the nature of the previous instruction is unknown.
Another VLIW system, the TINKER, is described in Conte, T. et. al., "Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings", IEEE: Proceedings of the 29th Annual Symposium on Microarchitecture, Dec. 2-4, 1996; and in Banerjia, Sanjeev. et. al., "NextPC computation for a banked instruction cache for a VLIW architecture with a compressed encoding", Department of Electrical and Computer Engineering, North Carolina State University. NextPC refers generally to the function for obtaining the program count of the next instruction to be executed. The TINKER architecture attempts to solve the variable-length problem by incorporating logic during the instruction cache (ICACHE) refill to calculate the lengths of the instructions and place the calculated value in a length field in the ICACHE. Therefore, the sequencing is simply PC.sub.i+1 =PC.sub.i +length.sub.i, which is done prior to refill.
However, the penalty for this solution is that a larger ICACHE or a separate memory array is required to hold the length.sub.i information of all the instructions in the ICACHE. The size of the array depends on the maximum number of instructions that are possible for a single ICACHE line (instruction.sub.max) the number of bits required to encode the maximum length of one instruction (encode.sub.-- bits), and the number of cache lines in the ICACHE (rows). Therefore, the length field array dimensions will be: rows X instruetion.sub.max X encode.sub.-- bits (lines X bits). As an example, the HP Lisard processor, which has an ICACHE of 256 lines, a line size of 32 words (32 instructions max), and a maximum instruction length of 12 words (4 encoding bits) would need an array of 256.times.128, which represents a 12.5% increase in the size of the ICACHE to accommodate encoding overhead.
Accordingly, a need remains for a way to sequence through variable length instructions without experiencing delays in instruction decoding to fetch additional lines of instructions from the instruction cache when a sequence of instructions crosses an instruction cache line boundary and without requiring additional storage capacity to store the length of each variable length instruction along with the instruction.