1. Technical Field of the Invention
This invention is related to a method and apparatus for compressing and decompressing object code instructions that are included in a software program that executes on a computer system. In particular, the compressing of object code instructions for a computer system provides for lower power consumption by the computer, more efficient transferal of compressed object code instructions from the memory storage devices, and a reduction in the number and size of power-consuming memory storage devices. The decompression apparatus of the invention advantageously uses a decompression engine to achieve the energy consumption savings incorporated into the compressed object code instructions. The invention is embodied in a compression method that compresses object code instructions for a computer system, a computer system for implementing the compression method, a computer program product bearing software instructions that implement the compression method, a decompression method that decompresses the compressed object code instructions and a decompression engine that decompresses the compressed object code instructions.
2. Description of the Related Art
The following references provide useful background information on the indicated topics, all of which relate to the invention, and are incorporated herein by reference:
M. Keaton and P. Bricaud, Reuse Methodology Manual for System-On-A-Chip Designs, Kluwer Academic Publishers (1998);
Tl""s 0.07 Micron CMOS Technology Ushers In Era of Gigahertz DSP and Analog Performance, Texas Instruments, (1998);
T. M. Kemp, R. K. Montoye, J. D. Harper, J. D. Palmer and D. J. Auerbach, A Decompression Core for PowerPC, IBM Journal of Research and Development, vol. 42(6), pp. 807-812 (November 1998);
Y. Yoshida, B. Y. Song, H. Okuhata and T. Onoye, An Object Code Compression Approach to Embedded Processors, Proceedings of the International Symposium on Low Power Electronics and Design, pp. 265-268 (August 1997);
T. Okuma, H. Tomiyama, A. Inoue, E. Fajar and H. Yasuura, Instruction Encoding Techniques for Area Minimization of Instruction ROM, International Symposium on System Synthesis, pp. 125-130 (December 1998);
A. Wolfe and A. Chanin, Executing Compressed Programs on an Embedded RISC Architecture, Proceedings of 25th Annual International Symposium on MicroArchitecture, pp. 81-91, (December 1992).
C. Lefurgy, P. Bird, I. Cheng and T. Mudge, Code Density Using Compression Techniques, Proceedings of the 30th Annual International Symposium on MicroArchitecture, pp. 194-203 (December 1997);
S. Y. Liao, S. Devadas and K. Keutzer, Code Density Optimization for Embedded DSP Processors Using Data Compression Techniques, Proceedings of the 1995 Chapel Hill Conference on Advanced Research in VLSI, pp. 393-399 (1995);
D. A. Huffman, A Method for the Construction of Minimum-Redundancy Codes, Proceedings of the IRE, vol. 4D, pp. 1098-1101 (September 1952);
L. Benini, A. Macii, E. Macii and M. Poncino, Selective Instruction Compression for Memory Energy Reduction in Embedded Systems, IEEE/ACM Proceedings of International Symposium on Low Power Electronics and Design, pp. 206-211 (1999);
B. P. Dave, G. Lakshminarayana, and N. K. Jha, COSYN: Hardware-Software Co-Synthesis of Embedded Systems, Proceedings of Design Automation Conference, pp. 703-708 (1997);
I. Hong, D. Kirovski, G. Qu, M. Potkonjak and M. Srivastava, Power Optimization of Variable Voltage Core-Based Systems, Proceedings of Design Automation Conference, pp. 176-181 (1998);
T. Ishihara and H. Yasuura, Voltage Scheduling Problem for Dynamically Variable Voltage Processors, IEEE/ACM Proceedings of International Symposium on Low Power Electronics and Design, pp. 197-201 (1998);
C. Ta Hsieh, M. Pedram, G. Mehta and F. Rastgar, Profile-Driven Program Synthesis for Evaluation of System Power Dissipation, IEEE Proceedings of 34th Design Automation Conference, pp. 576-581, 1997;
V. Tiwari, Logic and System Design for Low Power Consumption, Ph.D thesis, Princeton University (November 1996);
Q. Qiu, Q. Wu and M. Pedram, Stochastic Modeling of a Power-Managed System: Construction and. Optimization, IEEE/ACM Proceedings of International Symposium on Low Power Electronics and Design, pp. 194-199 (1999);
L. Benini, A. Bogliolo, G. Paleologo and G. De Micheli, Policy Optimization for Dynamic Power Management, IEEE Transactions on CAD, vol. 18, no. 6, pp. 813-33 (June 1999);
W. Fornaciari, D. Sciuto and C. Silvano, Power Estimation for Architectural Explorations of HW/SW Communication on System-Level Buses, HW/SW Codesign Workshop, Rome (May 1999);
M. R. Stan and W. P. Burleson, Bus-Invert Coding for Low Power I/O, IEEE Transactions on VLSI (March 1995);
M. R. Stan and W. P. Burleson, Limited-Weight Codes for Low Power I/O, International Workshop on Low Power Design (April 1994);
T. Givargis and F. Vahid, Interface Exploration for Reduced Power in Core-Based Systems, International Symposium on System Synthesis (December 1998);
Jue-Hsien Chern, et al., Multilevel Metal Capacitance Models for CAD Design Synthesis Systems, IEEE Electron Device Letters, vol. 13, no. 1, pp. 32-34 (January 1992).
P. G. Howard and J. S. Vitter, Practical Implementations of Arithmetic Coding, invited paper in Images and Text Compression (Kluwer Academic Publishers, Norwell, Mass.).
There will now be provided a discussion of various topics to provide a proper foundation for understanding the invention.
The advent of new VLSI technologies as well as the advent of state-of-the-art design techniques like core-based System-on-a-Chip (hereinafter xe2x80x9cSOCxe2x80x9d) design methodologies, such as those described by Keaton and Bricaud in Reuse Methodology Manual for System-on-a-Chip Designs, has made multi-million gate chips a reality. SOC designs are especially important to low-power devices like personal digital assistants, cellular phones and digital cameras. Obviously, since the amount of available energy in a low-power device is limited, these devices have to wisely budget energy consumption in order to enable the user to increase the number and/or length of telephone calls, to shoot more pictures, etc., between recharging phases. From the viewpoint of a system designer, the reduction of energy/power consumption is a major design goal. The physically important factor power per square millimeter must be kept at reasonable levels to avoid overheating, malfunctions and electromigration. Keeping power per square millimeter at reasonable levels leads to longevity of the device. Due to the various problems related to high energy and power consumption, designers have come up with diverse approaches at all levels of abstraction, starting from the physical level up to the system level. Experience shows that a high-level method may provide additional degrees of freedom that result in a more optimized design. However, a major drawback in system-level optimization is the complexity of the design space as a result of the vast amount of possible parameters. In order to conduct efficient system-level optimizations, powerful design space explorations are needed. In case of system-level power optimization, a tool that delivers fast and reliable power estimates for various chosen system parameters in order to evaluate the impact of any optimization step is required.
Code compression has increasingly become a popular technique, mainly as a method to reduce chip area in embedded computers. Most methods targeted for embedded systems use a run-time decompression unit to decode compressed instructions on-the-fly. Wolfe and Chanin were the first to propose such a scheme, wherein Huffman codes were used to encode cache blocks. A hardware decompression unit is interposed between the cache and main memory to decompress cache blocks to their original size before they are inserted into the cache. Kemp, et al. at IBM, developed a similar technique using sophisticated Huffman tables. Other techniques use a table to index sequences of frequently appearing instructions using a hardware decompression module as proposed by Lefurgy, et al., or decompress the compressed object code instructions completely in software, as proposed by Liao, et al. Okuma, et al. proposed an encoding technique that takes into account fields within instructions.
Most of the previous work has focused on memory optimization. Yoshida, et al. proposed a logarithmic-based compression scheme that can result in power reduction as well. A recent approach proposed by Benini, et al. investigated the impact of code compression on the power consumption of a system with no cache. However, the impact of code compression on other system parts, like caches and CPUs, was not investigated.
Various approaches have been proposed to minimize power consumption of diverse system parts. Stan and Burleson describe a bus-invert technique that reduces bus power consumption. If the Hamming distance of two consecutive data words is greater than half the word size, the inverted data is sent. Givargis and Vahid have developed a set of mathematical formulas for rapidly estimating bit switching activities on a bus with a given size and encoding scheme. Combined with the capacitance estimation formulas by Chern, et al., the mathematical formulas can rapidly estimate and optimize bus power consumption. Fornaciari, et al. proposed another bus power optimization approach using various bus power encoding schemes. At the architectural-level for single system components (i.e., not considering any trade-offs between various system parts), Hsieh et al. investigated high performance microprocessors, and derived specific software synthesis algorithms to minimize power. In addition, Tiwari has investigated the power consumption at the instruction-level for different CPU and DSP architectures and derived specific power optimizing compilation strategies.
Other approaches focus on a whole system in order to optimize for low power consumption. For example, Dave, et al. introduced a co-design methodology that optimizes for power and performance at the task-level. Hong, et al. and Ishihara, et al. exploit the technique of variable voltage scaling in order to minimize power consumption. Qiu, et al. and Benini, et al., among others, have explored system power management approaches.
The invention has been made in view of the above circumstances and has an object to overcome the above problems and limitations of the prior art.
Additional objects and advantages of the invention will be set forth in part in the description that follows and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
An object of the invention is to provide a method for compression of object code instructions for an embedded computer.
It is a further object of the invention is to provide a computer system adapted to providing a method for compression of object code instructions for an embedded computer.
It is a further object of the invention is to provide a computer program product bearing software instructions adapted to command a computer system to compress object code instructions for an embedded computer.
It is a further object of the invention to provide an apparatus for decompression of compressed object code instruction prior to their execution by a central processing unit of an embedded computer.
According to the invention, uncompressed object code instructions can be advantageously decomposed into predetermined instruction classes. Each predetermined instruction class is compressed differently from each other predetermined instruction class.
According to the invention, certain instruction classes are used to derive a mathematical model used for compression and decompression of object code instructions.
According to the invention, the decompression engine can decompress multiple instruction simultaneously from different predetermined instruction classes.
Preferably, the invention provides a method for compressing uncompressed object code instructions from an executable program for an embedded computer, wherein the uncompressed object code instructions are compressed to reduce power consumption, the method comprising the decomposition of uncompressed object code instructions into at least four predetermined instruction classes, excluding certain uncompressed object code instructions in order to derive a mathematical model to use for compressing predetermined uncompressed object code instructions, compressing uncompressed object code instructions from at least one of the plurality of predetermined instruction classes, wherein the uncompressed object code instructions are compressed using the derived mathematical model, and building a decoding table for the compressed object code instructions in accordance with the derived mathematical model, compressing uncompressed object code instructions from at least one of the plurality of predetermined instruction classes, wherein an address offset is added to each object code instruction following its compression; and patching each address offset that was added to a compressed instruction.
According to the invention, a decompression table that uses indexing is constructed for certain instruction classes known as fast dictionary instructions.
According to the invention, predetermined bit sequences are appended to the compressed object code instructions in order to identify the instruction class for decompression.
According to the invention, a second decompression table is built using non-branching object code instruction and table-based mathematical encoding.
According to the invention, address offsets in branching instructions are patched in order to properly point into compressed address space.
Preferably, the invention provides a computer system adapted to compressing uncompressed object code instructions from an executable program for an embedded computer, wherein the uncompressed object code instructions are compressed to reduce power consumption, the computer system including a processor and a memory including software instructions adapted to enable the computer system to perform the steps of decomposing the uncompressed object code instructions into at least four predetermined instruction classes, excluding certain uncompressed object code instructions in order to derive a mathematical model to use for compressing predetermined uncompressed object code instructions, compressing uncompressed object code instructions from at least one of the plurality of predetermined instruction classes, wherein the uncompressed object code instructions are compressed using the derived mathematical model, and building a decoding table for the compressed object code instructions in accordance with the derived mathematical model, compressing uncompressed object code instructions from at least one of the plurality of predetermined instruction classes, wherein an address offset is added to each object code instruction following its compression; and patching each address offset that was added to a compressed instruction.
According to the invention, the computer system can download compressed object code to a memory resident on an embedded computer system, and then dynamically debug the downloaded compressed object code.
Preferably, the invention provides a computer program product for enabling a computer system to compress uncompressed object code instructions from an executable program for an embedded computer, wherein the uncompressed object code instructions are compressed to reduce power consumption, the computer program product including software instructions for enabling the computer system to perform predetermined operations, and a computer readable medium bearing the software instructions, the predetermined operations including decomposing the uncompressed object code instructions into at least four predetermined instruction classes, excluding certain uncompressed object code instructions in order to derive a mathematical model to use for compressing predetermined uncompressed object code instructions, compressing uncompressed object code instructions from at least one of the plurality of predetermined instruction classes, wherein the uncompressed object code instructions are compressed using the derived mathematical model, and building a decoding table for the compressed object code instructions in accordance with the derived mathematical model, compressing uncompressed object code instructions from at least one of the plurality of predetermined instruction classes, wherein an address offset is added to each object code instruction following its compression, and patching each address offset that was added to a compressed instruction.
Preferably, the invention provides an embedded computer for executing compressed object code instructions, wherein the object code instructions have been compressed to reduce power consumption, the embedded computer including a central processing device, a storage device, a memory cache device, a decompression engine interposed between the memory cache device and the central processing device, and an interface bus of a predetermined bit width interconnecting the central processing device, the storage device, the memory cache device and the decompression engine allowing communication therebetween, wherein compressed object code instructions are decompressed by the decompression engine prior to their transmittal to the central processing device.
According to the invention, the decompression engine includes a fast dictionary look-up table device, a branch control device, a decoding device and a controller for coordinating the decompression of compressed object code instructions. The controller generates signals for use by the various devices during the decompression of compressed object code instructions.
Preferably, the invention further provides a circuit for decompressing compressed object code instructions that have been compressed to reduce power consumption, the circuit comprising an input buffer circuit that receives compressed object code instructions, a first decoding circuit having an input connected to an output of the input buffer circuit, a second decoding circuit having an input connected to the output of the input buffer circuit, a third decoding circuit having an input connected to the output of the input buffer circuit, an output buffer circuit having an input connected to an output from each of the first, second and third decoding circuits; and a controller circuit controlling the first decoding circuit, the second decoding circuit, the third decoding circuit and the output buffer circuit, wherein the controller circuit coordinates the decompression of compressed object code instructions.
According to the invention, the input buffer circuit stores compressed object code instructions in a memory storage device, a multiplexing circuit and a decoder to control the multiplexing circuit.
According to the invention, the output buffer circuit includes a memory storage device connected to the first and second decoding devices, a multiplexing circuit connected to the memory storage device and the third decoding device, and a second memory storage unit connected to the multiplexing circuit.
Preferably, the invention further provides a circuit for decompressing compressed object code instructions that have been compressed to reduce power consumption, the circuit comprising an input buffer circuit for receiving and distributing compressed object code instructions transferred from a memory storage device, a first decoding circuit for decompressing compressed fast dictionary instructions, a second decoding circuit for decompressing compressed branching object code instructions, a third decoding circuit for decompressing non-branching object code instructions, an output buffer circuit for receiving and ordering the output of the first, second and third decoding circuit,; and a controller circuit controlling the first decoding circuit, the second decoding circuit, the third decoding circuit and the output buffer circuit, wherein the controller circuit coordinates the decompression of compressed object code instructions.
According to the invention, the controller circuit generates signals to control the decompression of the compressed object code. The controller generates signals to properly order the decompressed instructions prior to transmittal to a central processing unit, and also signals the central processing unit when its cannot accept more instructions for decompression.
The above and other objects and advantages of the invention will become apparent from the following detailed description and with reference to the accompanying drawing figures.