1. Technical Field
The present invention relates to a method and system for compressing data in general, and in particular to a method and system for compressing executable code in the context of Reduced Instruction Set Computer (RISC) architectures.
2. Description of the Prior Art
Reduced Instruction Set Computer (RISC) architectures simplify processor and compiler design by making all instructions have the same size and follow a few simple formats. A price to pay for these advantages is the relatively large size of program code written using these instruction sets. The large code size reduces instruction cache effectiveness. It also increases program-loading time when code is shipped over in a network environment or retrieved from a slow mechanical device like a disk.
Currently, network computers, embedded controllers, set-top boxes and other hand-held devices receive executables over the network or possibly slow phone links or communication channels. These devices may have very limited memory capacity and when their memory is constrained, large programs may not fit in the available memory to run on the device. Therefore, for RISC processors to be competitive in the market place, it is necessary to have highly efficient code compression that mitigates the disadvantage of large executable sizes.
Programs written in RISC instruction sets have traditionally been difficult to compress. There are several reasons for this.
First, RISC instructions contain redundant fields. These fields pollute the data space and reduce the effectiveness of the data model that a traditional compressor builds during compression, thereby reducing the compression ratio. Furthermore, a traditional compressor will still encode these redundant fields even if they do not convey any information.
Second, because the instructions typically take several formats, it is often difficult to discern patterns in the instructions within program code.
Third, RISC instructions use registers extensively, and compressing register encoding is very difficult because they are densely coded with very large information entropy.
Fourth, the values of literals vary widely with no discernible pattern or frequent use.
Therefore there is a need for compressing instructions in a Reduced Instruction Set Computer (RISC) architecture such as the PowerPC family. Traditional compressors in the prior art treat the instructions in a program as a stream of bits, and try to find patterns within this stream to help construct a more compact presentation of the program (e.g. Ziv Lempel compression, Huffman encoding, etc.). However, consecutive instructions within a program may not easily exhibit patterns that a compressor can exploit. Therefore a need exists for:
removing redundancy from instructions
processing the program to expose more patterns for a traditional compressor such that it can be more effective in compression.
Devising a coding scheme for registers and literals to expose more patterns.
The present invention solves these problems by presenting a technique in a novel and unique manner, which is not previously known in the art.
In view of the foregoing, it is therefore an object of the present invention to provide an improved method and system for compressing executable code in the context of a Reduced Instruction Set Architecture (RISC).
It is another object of the present invention to provide an improved method and system for performing data compression by removing redundancy from the instructions through expanding the existing architecture to include artificial instructions encoded such that there is no redundancy.
It is yet another object of the present invention to provide an improved method and system which produces compression ratios between 3.5 to 4, compared to the 2 to 2.6 range of conventional compression.
It is still yet another object of the present invention to provide an improved method and system, which utilizes compression techniques that exploit the semantics of an instruction set architecture and where traditional compression techniques work in association to produce highly compressed code.
In accordance with a method and system of the present invention, a compression scheme for program executables that run in a reduced instruction set computer (RISC) architecture such as the PowerPC is disclosed. Initially, a RISC instruction set is expanded to produce code that facilitates the removal of redundant fields. The program to be compressed is rewritten using this new expanded instruction set. Next, a filter is applied to remove redundant fields from the expanded instructions. The expanded instructions are then clustered into groups, such that instructions belonging to the same cluster show similar bit patterns. The grouping could be done based on the instruction formats. For example, one may group branch instructions in one cluster, load/store instructions in one cluster, and so on. However, some instructions do not fall under this model, and for which there may be a need for a more sophisticated clustering mechanism. Within each cluster, scopes are created such that register usage patterns within each scope are similar. Within each cluster, further scopes are created such that literals within each scope are drawn from a close range of integers. A conventional compression technique such as Huffman encoding is then applied on each scope within each cluster. Furthermore, Huffman encoding is also applied on the opcodes of all instructions, thus making these opcodes function as anchors within the compressed stream. Dynamic programming techniques are then used to produce the best combination of encoding among all scopes within all the different clusters. As a result of using dynamic programming, scopes and clusters may be combined that use the same encoding scheme to reduce the size of the resulting dictionary.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.