1. Technical Field
The present invention relates to a method and system for compressing data in general and in particular to a method and system for compressing executable code. Still more particularly, the present invention relates to a method and system for compressing executable code in the context of a Reduced Instruction Set Computer (RISC) architecture.
2. Description of the Prior Art
Reduced Instruction Set Computer (RISC) architectures simplify hardware implementation and compiler design by making all instructions have the same size and follow a few simple formats. A price to pay for these advantages is the large size of executable code written in these instruction sets. The large code size reduces instruction cache effectiveness on modern processors and utilization of memory resources. It also increases program-loading time when code is shipped over in a network environment or retrieved from a slow mechanical device like a disk.
Currently, network computers, embedded controllers, set-top boxes, hand held devices and the like receive executables over a network or possibly through slow phone links or communication channels. Furthermore, these devices may have very limited memory capacity and when their memory is constrained, large programs may not fit in the available memory to run on the device. Therefore, for devices having RISC processors to be competitive in their segment of the market place, they may require highly efficient code compression that mitigates the disadvantage of large executable sizes. Traditionally, however, it has been difficult to compress executable code for RISC processors.
The difficulty in compressing executable code for RISC processors is partly due to the relatively high frequency of using registers in instruction encoding. A typical RISC architecture such as the International Business Machines' PowerPC processor implements 32 integer and 32 floating point registers. The instruction set encodes these registers using 5-bit codes to express a register number from 0 to 31. This encoding poses problems when compressing an executable for two reasons. First, the encoding is dense since all possible values for a register code are valid, resulting in high entropy encoding that is difficult to compress. Second, compilers for RISC processors use all registers. Therefore all possible register codes may appear uniformly throughout the code, making it difficult for a conventional compressor to find frequent patterns and produce effective compression.
The problem stated above is substantial because register fields occupy a large chunk of the instruction code, and because RISC instruction sets use registers as the primary data operands. An instruction may contain one, two or three register fields consuming between 5 to 15 bits of a 32-bit instruction code. In a typical program, register codes account for 20% to 40% of the total code size.
Compressing literals also poses a similar problem to that of compressing registers. Literals are data constants that appear in an instruction set. For example, literals may specify the values of branch addresses, pointer offsets, and constant data values. Literals complicate compression because they cover a wide range of integers. Combined, registers and literals contribute between 50% to 75% to the size of a typical executable code on a RISC processor.
Therefore there is a need for a method and system to increase the effectiveness of compressing register and literal encodings for RISC processor such as the PowerPC family. The present invention solves these problems in a novel and unique manner, which is not previously known in the art.