Embedded computing systems are space and cost sensitive. Memory is one of the most restricted resources, posing serious constraints on program size. For example, in a high-end hard disk drive application, an embedded processor occupies a silicon area of about 6 mm2, while the program memory for that processor takes 20 to 40 mm2. Reducing program size, therefore, could result in significant savings in terms of cost, size, weight, and power consumption. In VLIW architectures where a high-bandwidth instruction pro-fetch mechanism is required to supply multiple operations per cycle, reducing code size and providing fast decompression speed are critical to overcoming the communication bottleneck between memory and CPU.
Data compression is a mature field. However, most of the existing state-of-art data compression algorithms cannot be applied to code compression directly. Compressed code must be decompressed during program execution, implying that random access in decompression is necessary since code has branch, jump and call instructions altering flow of the program.
Existing statistical code compression algorithms are mostly variable-to-variable coding or fixed-to-variable coding, meaning the decompression procedure is sequential, since the decompressor does not know where to start decompressing the next symbol until the current symbol is fully decompressed. Because VLIW machines have long instruction words, existing code compression algorithms, which are very difficult to parallelize, introduce very long delays into instruction execution during decompression.
VLIW architectures are attractive due to hardware simplicity and compiler technique advances. VLIW architectures have been widely adopted in a number of high-performance microprocessors in different application areas, from DSP (Texas Instrument's TMS320C6x family and Lucent Technology's StarCore) and VSP (e.g., Philips Semiconductor's TriMedia), to general-purpose microprocessors such as Intel's Itanium, which adopts the IA-64 architecture. Even though VLIW processors have gained a lot of popularity recently, code compression techniques targeting VLIW architectures have not been studied as extensively as in RISC architectures.
Traditional VLIW architectures need more program memory because they require rigid instruction format and/or special pack/unpack operations. To address the program memory issue, modern VLIW processors have flexible instruction formats. Unfortunately, dictionary code compression techniques proposed to date work for rigid instruction formats only. Further, flexible instruction formats mean each sub-instruction within the long instruction word must not necessarily correspond to a specific functional unit. This eliminates most NOP instructions found in code for rigid VLIW architectures, making the code denser. Some VLIW-based architectures introduced in the last three years, such as StarCore's SC 140, use mixed-width instruction sets (supporting both 16-bit and 32-bit instructions). In this case, the processor uses short instructions when the full range of instruction options is not necessary, and reserves use of the longer instructions for performance-critical software sections requiring the full power of the architecture. These efforts result in VLIW code that is less compressible.
Code density is not the only design goal for embedded systems, power is also a concern. Bus power consumption is a major part of the total system power consumption. Instruction bus power optimization has not, however, been intensively explored, since the instruction bus is normally considered rigid and unchangeable.
For the foregoing reasons, there is a need in the art for code compression techniques to ease the memory constraint problem in embedded systems while minimizing bus power consumption, techniques applicable to the flexible instruction formats of VLIW processors and operational with a fast decompression scheme providing random access during decompression with minimal delay to instruction execution.