This invention is concerned with compression and decompression of program words, particularly program words for VLIW processors.
Due to the need for high-performance processing in certain tasks, such as game engines, graphic rendering systems, complex system simulators, multimedia, and real-time digital signal processing, there is a demand for high-speed processors, which can quickly process large amounts of data. Superscalar processors, which can execute more than one instruction at a time, have become necessary components of high-performance devices. Older microprocessor designs, such as complex instruction set computing (CISC) and reduced instruction set computing (RISC), can be used to execute several instructions at once, although this requires complex control circuitry which can be quite expensive.
Very long instruction word (VLIW) processors are able to process multiple individual instructions for multiple individual functional units every clock cycle. VLIW processors have a simpler design than CISC and RISC chips. VLIW chips can be less expensive, use less power, and achieve higher performance than either CISC or RISC chips. The drawback, however, is that while the design of the VLIW chip is simpler than that of its predecessors, creating and compiling code that will enable the VLIW chip to operate efficiently can be difficult. Since the instruction words for VLIW processors code for several instructions, the instruction words for VLIW processors are consequently very long, up to hundreds of bits in length, and require a large amount of program memory for storage as well as a large bus that can transfer the instruction word from off-chip memory to the processor. This is problematic, particularly in smaller, hand-held devices where the physical dimensions of the device limit the size of the processor, bus, and memory that may be employed in the device.
U.S. Pat. No. 5,819,058, xe2x80x9cInstruction Compression and Decompression System and Method for a Processor,xe2x80x9d to Miller et al. describes a system and method for reducing the amount of memory required to store very long instruction words in a VLIW processor. The VLIWs are compressed in a number of ways, including shortening default instructions, compressing bits that are not required to execute instructions, and assigning short codes to longer instructions, which are expanded at execution time.
U.S. Pat. No. 5,878,267, xe2x80x9cCompressed Instruction Format for Use in a VLIW Processor and Processor for Processing Such Instructions,xe2x80x9d to Hampapuram et al. describes software which compresses VLIW instructions which are stored in memory and then decompressed xe2x80x9con the flyxe2x80x9d after being read out from the cache. Each instruction consists of several operations. Each operation is compressed according to a compression scheme for that particular operation; the compression scheme assigns a compressed operation length to each operation. Compression is dependent on at least one feature of the operation. Branch targets are uncompressed.
It is an object of the invention to provide an apparatus and method for minimizing program memory size for VLIW architectures.
It is another object of the invention to reduce the program bus size of a VLIW architecture.
It is a further object of the invention to reduce power consumption in CMOS processor designs.
These objects are met by an apparatus and method for dynamic program decompression. A program is converted from a time-sequential sequence of microcodes corresponding to each assembler instruction into horizontal VLIW microcode. (Although for purposes of explanation the VLIW architecture is primarily discussed, this is not meant to imply that the application of the disclosed apparatus and system is limited to VLIW architecturesxe2x80x94the apparatus and method may also decompress a generic flow of information.) The horizontal VLIW microcode is then compressed into a bit sequence that is stored in program memory.
The compression algorithm producing the bit sequence takes advantage of regularities occurring in the sequence of values assigned over time to each field of the horizontal VLIW microcode. A trajectory in the space of operations and operands to be executed at each cycle can be shown. If this trajectory is considered over time, the information needed to specify a single instruction can be reduced by specifying the relationship among a set of operands (and/or opcodes) to be issued to the processor at a given cycle and those issued at previous cycles, for instance by describing the trajectory in terms of starting points and deltas, rather than expressing the instruction itself.
The resulting bit sequence is then fed to dynamic program decompression devices, or dyprodes. Each of these devices is fed a continuous stream of 1- or 2-bit microcodes, i.e., the bit sequence describing main features of the trajectory of the program. The dyprodes, which are assembled using registers and multiplexers and are driven by a clock, reset signals, and the microcode, use the microcode from either internal or external memory and, where appropriate, input from either internal or external memory to produce an uncompressed field of the program word. By using a series of dyprodes, the entire uncompressed program word can be reconstructed and passed on to the processor for execution.
The use of a dyprode system may reduce the program memory size considerably, as well as reducing the size of the bus connected to an off-chip program memory. Power consumption in CMOS processor designs is also reduced because there is a lower toggle rate inside the processor""s register file and processor devices when, during cycles where a processor device is not used, the dyprode freezes controls and read addresses to values assigned during the last useful operation.
Dyprodes may be modified to decompress different parts of a program word. Some dyprodes are best suited to decompress opcodes or immediate values while other types of dyprodes decompress register file addresses.