As demand for high-performance, low-power embedded architectures increases, development is generally turning away from flexible but inefficient GPAs (general purpose architectures) and highly efficient but inflexible ASICs (application specific integrated circuits), towards a symbiosis of those two worlds efficiently embodied in ASIPs (application specific instruction set processors). The flexibility of ASIPs allows late design changes—even after product roll-out. However, compared to ASICs and GPAs, ASIPs provide a tradeoff of computational performance and flexibility on the one hand and power consumption on the other.
Moreover, designing ASIPs can be far more complex than assembling systems with standard processors. Typically, designing ASIPs comprises an iterative exploration in which hardware and software are explored and modified. Integrated development tool suites deliver the flexibility needed to design a processor tailored to the needs of the intended application. However, the design effort is still significant when designing all aspects of the architecture in detail. For example, with conventional tools considerable effort goes into the definition of an instruction set, which plays an important role as it conveys the data and control flow of a given application.
Moreover, the instruction set should be designed to use memory and power efficiently. An instruction set with a small instruction word width is desirable in that chip memory and power are dependent on the width of the instruction set. Thus, an important factor for designing a power-efficient instruction set is the overall instruction word width. Several approaches have been attempted towards compact instruction encoding. However, each conventional technique has limitations. Some conventional approaches attempt to minimize the widest instruction word width. Other conventional approaches are based on minimizing the average instruction width by using statistical means profiled from a specific application program that is to be run on the architecture being designed. Other conventional techniques seek to conserve power by profiling the specific application program to minimize the toggling of bits in consecutive instruction words and encoding those words similarly.
Immediate Encoding
One conventional method to produce an efficient instruction set is to encode immediate values in instruction words, so that the bit-fields in the instruction words used for immediate values can be down-sized. This is illustrated in the diagram 140 in FIG. 1 showing a pre-decoding stage 153 and a main decoder 155. The fields 152 of the instruction 151 do not refer to the immediate value itself in this technique, but to an address in a lookup-table 154 that contains the actual value 156. For example, instead of using 16-bits in a 32-bit instruction word for conveying an immediate value, an 8-bit address in a lookup-table comprising at most 256 different immediates would be used, thereby reducing the width of the instruction to 24 bits.
While potentially reducing the overall instruction width, this approach has several major disadvantages. Overhead is added to the decoding process because an additional immediate decoding stage has to be added. Further, the size of the lookup-table 154 may have to be increased, thereby leading to a larger die-size. If the table-size is not increased, the flexibility of the architecture is reduced because the number of different immediate values that an application may use is limited to the number available in the lookup-table 154. Therefore, using this approach, it is mandatory to consider the application designed to run on the architecture. Further, for this method to be beneficial, the increase in die-size that is evoked by the implementation of the lookup-table 154 has to be compensated by the reduction in program memory. There might also be a slight decrease in system performance due to the table-walk necessary for immediate decoding. These trade-offs have to be measured wisely to gain an overall advantage from this method. Such assessment of the trade-offs is a difficult process and may result in an inaccurate assessment.
Improving Code Density Using Compression Techniques
A conventional approach that uses dictionary-based compression techniques similar to file compression programs for improving code density is illustrated in FIG. 2. In this example, certain opcodes (referred to as illegal opcodes) of the original program 165 are mapped to frequently used instruction-sequences 166 (opcodes and specific operands of several instructions). These mappings are stored in a dictionary 168 and the sequences in program code are replaced by the “virtual instructions” 169. Whenever such a virtual instruction 169 is hit in the compressed program 170, it is looked up in the dictionary 168 and replaced by the original sequence 166.
Thereby, a set of instructions (e.g., instructions sequences 166) may be replaced by a single one (e.g., virtual instruction 169), efficiently compressing the program. The disadvantage of this technique is that it adds overhead to the decoding process as the illegal instruction has to be intercepted and to be replaced by the original program code. This will reduce performance of the program significantly. Therefore, this approach is only feasible in an environment where the systems' bottleneck is memory, not speed. Furthermore, compression results depend heavily on the specific program structure. If there are not many identical instruction sequences in the code, no relevant compression will be achieved. This is especially true as the dictionary has to be stored as part of the program. Hence, the actual application has to be considered before this method can be applied successfully.
Minimizing Average Instruction Width Using Huffman-Encoding
In information theory, statistical methods are used to compress information symbols in a signal. While frequently used symbols get a shorter codeword, rarely occurring symbols have a longer codeword. For example when encoding a text, frequent words like “the” or “and” would get a shorter codeword than the average codeword-length needed, specific names would get a codeword longer than the average length. On average, the total information to be transmitted would be reduced. The compression ratio depends on the prior statistical analysis of the information that is to be transferred. One statistical encoding method is Huffman-encoding.
Huffman-encoding is applied to instruction opcode encoding to minimize the average instruction word width in one conventional technique. However, this technique depends heavily on the knowledge of the actual application that is to be run on the architecture and will deliver optimum results only for this specific application. While not absolutely limiting the flexibility of the architecture (other programs can be run), it may deliver unacceptable results for other application cases.
Optimization of Bit-Toggling Using Statistical Profiling
Another approach to design power-efficient instruction sets is also based on statistical analysis of the application intended to be executed on the architecture. The basic idea is to profile the program for consecutive instructions and to encode them in a similar way, so that the number of bits whose state has to be switched is minimal when loading the next instruction into the decoder. This reduces power dissipation, as it reduces the number of bit lines that have to switched in the decoder, and the static RAM cells used for register memories may consume more power when they are switched from one state to another then when they are idle.
The problem is usually approached with mapping the instruction encodings to a finite state machine with weighted edges, using several methods (for example heuristics) to determine the optimum solution to it. As mentioned, this method also depends on statistical profiling, and therefore to be most efficient is constrained to the application for which it was optimized. In general, while some power efficiency can be achieved using this method, the gain may be orders of magnitudes lower than the one obtained by instruction width optimization. This is because reducing instruction width may reduce the size of the memory, which is one of the most power-hungry parts of the chip. On the other hand, reducing bit-toggling only slightly reduces power consumption within the decoding unit and registers. However, these units are usually only minor contributors to overall power consumption.