1. Field of the Invention
The present invention relates to techniques for compression of codes.
The invention has been developed with particular attention paid to its possible application to systems of an embedded type. Reference to the above possible application must not, however, be interpreted as in any way limiting the scope of the invention, which is altogether general.
2. Description of the Related Art
In the implementation of processors and, in particular, processors for embedded systems, it is necessary to take into account various limitations. Amongst these, the main limitations are represented by costs, power absorption and area of silicon occupied.
The limitations in terms of cost are dictated by the desire to use said processors as control devices in digital systems of a consumer type (for instance, for smart-card applications), where high volumes of production must be accompanied by low costs per unit produced.
The aspect of power absorption is becoming increasingly important, also in view of the application of such processors to devices such as cell-phone devices, the so-called xe2x80x9cportable digital assistantxe2x80x9d (PDA), etc., where it is important to prolong battery life as much as possible.
As regards the area on silicon occupied or, in general, the die size, if applications of the smart-card type are, in particular, considered, it emerges clearly that it is imperative to arrive at small areas of silicon occupation.
On the other hand, current applications require an increasingly high level of flexibility in order to enable all the functional requirements to be met. This fact means that it is necessary to load increasingly substantial portions of software code into the system memories, in order to execute them on the corresponding processor (for example, DSP or CPU). This fact, in turn, results in the tendency to use memories that are costly and are likely to absorb a lot of power.
There, thus, exists the need to provide solutions for reducing the size of the program memory of a device of the type described above.
A solution of this type is the one known by the commercial name IBM CodePackT(trademark). In this connection, it is useful to consult the corresponding user manual entitled IBM CodePack(trademark) Power PC Code Compression Utility, User""s manual version 4.1.
The algorithm in question has been developed with the chief aim of improving performance of the Power PC core. The architecture of the core in question, which has been designed as a typical RISC (Reduced Instruction Set Computer) architecture, incorporates instructions of a fixed length to simplify the design of high-performance processors. Almost always, instructions of a fixed length provide a code density that is smaller than that of a CISC (Complete Instruction Set Computer) design, but for the majority of applications of an embedded type, for which cost is a highly significant factor, code density is an important design parameter.
The compression scheme adopted in the IBM solution referred to above is based upon the observation that the Power PC instructions used in real applications are not evenly distributed over all the 32 bits used for representing an individual instruction. Certain instruction configurations or patterns appear with a much higher frequency than others. From the conceptual standpoint, these instructions can be represented by means of special bit sequences, which are shorter than the original instructions, and are replaced with the original instructions by the decompression controller when the instructions are sent to the processor core.
It is possible to devise numerous variants of this basic scheme. An improvement that proves effective in the case of 32-bit PowerPC instructions is that of compressing separately the 16 bits occupying a xe2x80x9chighxe2x80x9d position and the 16 bits occupying the xe2x80x9clowxe2x80x9d position of the 32 bits that make up the instruction.
In order to increase code density, the instructions most commonly used must be replaced by shorter bit sequences, which are designed to be expanded subsequently by the decompression controller, so as to re-constitute the original Power PC instructions. In order to maximize instruction density, sequences of a variable length, in terms of number of bits, are used, the sequences being located in the memory in end-to-end conditions. The compression software processes the image of the non-compressed code one instruction at a time, examining the instruction, so as to identify matches in the framework of the contents of the decoding table.
As has already been said, the above known technique is applied separately to the 16 xe2x80x9chighxe2x80x9d-position bits and to the 16 xe2x80x9clowxe2x80x9d-position bits in the context of the instruction, and the n most frequent xe2x80x9chighxe2x80x9d values are generally different from the n most frequent xe2x80x9clowxe2x80x9d values.
In the framework of the known solution considered herein, it is found, however, that both with reference to the xe2x80x9chighxe2x80x9d bits and with reference to the xe2x80x9clowxe2x80x9d bits, if a 16-bit string is encountered such that it does not recur with sufficient frequency for it to receive a specific slot in the table, the string of bits in question must be reproduced in xe2x80x9cliteralxe2x80x9d format on 16 bits. This requires the addition of three bits (usually xe2x80x9c111xe2x80x9d) precisely for identifying the string as a bit string that cannot be compressed.
A situation arises which is somewhat incongruent whereby the application of a mechanism designed to carry out a compression means that, in order to represent the 16 original bits, it is necessary to use as many as 19 bits. The situation is rendered even more incongruent when the conditions of non-compressibility considered previously arise both for the 16 xe2x80x9chighxe2x80x9d bits and for the 16 xe2x80x9clowxe2x80x9d bits that make up the instruction. The instruction, which originally comprised 32 bits, ends up, in fact, being represented by 38 bits. Consequently, the xe2x80x9ccompressedxe2x80x9d instruction is, in actual fact, not compressed at all, in so far as it has as many as six bits more than the original instruction.
The format for storing the compressed instruction envisages, in order, the tags, i.e., the fields that identify the type of instruction and the procedure used for compression, respectively, of the xe2x80x9chighxe2x80x9d bits and of the xe2x80x9clowxe2x80x9d bits and then the index values, i.e., in practice, the representation in compressed form (also in this case, in order, for the xe2x80x9chighxe2x80x9d bits and the xe2x80x9clowxe2x80x9d bits).
If all the compressed instructions are designed to be followed in sequence, this relatively simple coding scheme is satisfactory. The decompression controller is, in fact, simply pointed to the start of the image of the compressed instruction, and then fetches and decompresses the flow of the instruction, as required by the processor.
However, there exist application instruction flows that contain a significant number of branchings.
The above flows require the decompression controller to locate the start of the compressed instructions at any location of the addressing area represented by the image of the compressed instruction and render the operation of this scheme more critical.
In fact, to succeed in locating the start of any instruction, it is necessary to have available a mechanism capable of translating the address of the desired instruction into an address that points towards the start of the corresponding complex compressed instruction.
The scheme used in the known solution to which reference is being made envisages splitting the instructions into groups and providing a mechanism that will enable location of the start of the group that contains the desired instruction.
Next, the group is processed by the decoding mechanism until the desired instruction is reached. This mechanism requires, however, an addressing mechanism implemented by means of an index table, which is usually referred to as address-translation table (ATT).
An embodiment of the present invention provides a solution for compression that is able to overcome the drawbacks outlined previously.
According to an embodiment of the present invention, a process converts binary words from a non-encoded format to a compressed encoded format, in which, in the compressed encoded format, binary words are, at least in part, represented by encoded bit sequences that are shorter than the respective binary word in the non-encoded format. The encoded bit sequences are selected according to a statistical recurrence of the respective words in the non-encoded format. Associated with binary words with higher recurrence are encoded bit sequences including numbers of bits that are accordingly smaller. A correspondence between binary words in non-encoded format and the encoded bit sequences associated therewith is established by indices of at least one encoding vocabulary. The process includes: arranging said indices in an orderly sequence; organizing said sequence of indices in groups of vectors; and splitting each group of vectors into a given number of vectors. In addition, the process includes at least one of the following operations: encoding said vectors independently of one another, and calculating and storing in a table, for each group of vectors, a starting address of a compressed block, or else differences, expressed in bytes, with respect to a last complete address appearing in said table.
Some embodiments of the invention also regard the corresponding system, as well as the corresponding computer program product, which can be loaded directly into the memory of a digital processor and which contains portions of software code that are able to implement the process when the product is executed on a digital processor.
The process provides a compression technique that is able to operate directly on the binary code generated by the compiler for compressing it and generating a new binary file, the size of which is approximately 40% smaller than the size of the original file.
The compressed file can thus be loaded into the program memory of the processor.
The decoder may preferably be made in the form of a small ASIC, which can be located between the system bus and the instruction cache memory (for example, in the memory control unit or MCU or, if a hierarchy of caches with at least two cache levels is present, between the first and the second levels, so as to mask its latency better), or else directly at the output terminal of the external memory.