Today it is quite common for everyday devices and systems to incorporate computer technology. Personal digital assistants, pagers with integrated data and message services, smart phones, television remote controls, automotive engine controllers, and the like all rely upon microprocessors and/or microcontrollers to perform numerous and varied functions. These microprocessors and/or microcontrollers are commonly referred to as embedded processors. In most of these devices, the embedded processor executes a predefined, stored program.
As demands for more powerful, smaller, lighter, less expensive, and/or more energy efficient devices has risen, system designers have been tasked with packing more features into ever smaller components. These features are commonly controlled by a program (an instruction code) contained within an embedded processor. Since the size of the program used in embedded processors has quickly become a significant constraint on the miniaturization of electronic devices, reducing the program size has become a primary goal of system designers. A reduced code size often results in a reduction of a device's cost, size, weight, and/or power consumption. Additionally, as the profit margin on semiconductor devices ("chips") erodes, designers may be tasked with providing more devices per a given wafer area. Thus, miniaturization is today a primary goal of system designers.
Numerous approaches have been proposed for reducing the length of a program used in an embedded processor. One approach was proposed in 1996 by Peter L. Bird and Trevor N. Mudge in their paper, "An Instruction Stream Compression Technique" (hereafter "Bird and Mudge"). In their approach, Bird and Mudge analyze a program for patterns of frequently used sequences of instructions. This analysis is performed for all sequences within basic instruction blocks. A basic instruction block is a sequence of instructions within a program in which no jumps exist. For purposes of this description, "jumps" in a program shall refer to any deviation in the sequential processing of a sequence of lines of code including branches, conditional branches, sub-routines, and the like. The basic instruction block is always entered at the top of the sequence and exited only at the bottom of the sequence. Since jumps commonly occur in programs, basic instruction blocks may not be prevalent in a given program. Additionally, since basic instruction blocks are often quite short, the number of available patterns in a given program are often reduced. Hence the utility of the Bird and Mudge approach is often quite limited.
Additionally, under the Bird and Mudge approach, after the program is analyzed and patterns of basic instruction blocks identified, those patterns with the highest frequency of occurrence are assigned an opcode and are stored in Read Only Memory (ROM). The opcode is then placed in a directory which identifies the specific location in memory of the associated instruction sequence. The program 100 is then reassembled and consists of original lines of code 110 interleaved with opcodes 112, as shown in FIG. 1.
During an instruction fetch cycle, the decoder within a Central Processing Unit (CPU) checks the line of code of the incoming instruction. If no opcode 112 exists, the line of code is an uncompressed instruction which is executed in the regular manner. If an opcode 112 exists, the opcode 112 references the memory location at which the actual code sequence resides. The actual code sequence is then recalled from memory and executed.
In order to keep track of the location of the compressed instructions corresponding to an opcode, Bird and Mudge utilize a look-up table 114, wherein the opcode 112 identifies the location of the first instruction of the compressed sequence in the look-up table 114. The look-up table 114 also provides the location in memory 116 of the second instruction (if one should exist) of the compressed sequence and the number of remaining instructions 118, as shown in FIG. 1. For example, when the embedded processor encounters opcode three 120 during an instruction processing cycle, the embedded processor proceeds to the opcode three location 122 in the look-up table 114. The embedded processor executes the first instruction 124 associated with opcode three 120 and then proceeds to the memory location 128 of the second instruction (in this example, memory address 07002). Upon executing a second instruction 129, the processor proceeds in sequential order through memory 130 until the number of instructions 126 indicated in the look-up table 114 have been executed (in this case four instructions). The processor then resumes normal instruction processing in the original program code (thus, in this example the processor returns to the third instruction 136).
While the Bird and Mudge approach is effective, it has numerous disadvantages. First, this approach requires that space on the chip be allocated to track the number of instructions associated with an opcode, and the number of instructions executed or which remain to be executed. Thus, some sort of counter must be included in the processor (or the processor's normal routines interrupted to keep track of the instruction count). Additionally, the second table requires the allocation of additional space on the chip. Thus, this approach requires more space, more power, and inhibits miniaturization.
Another disadvantage of the Bird and Mudge approach is that it only works for instruction sequences that are contained within a basic instruction block (i.e., this approach does not work for instruction sequences which contain jumps). Since many programs have numerous jumps and conditional branches, the application of Bird and Mudge is often extremely limited. Additionally, the Bird and Mudge approach can not be used with arguments (wherein an argument is a portion of an instruction which references another value). Designers prefer a sequence of instructions which correspond to a given code word, where the instructions can be parameterized. Ideally, the arguments in a sequence of instructions can be suitably replaced such that the instructions can be flexibly configured to function with specific variables. For example, a sequence of instructions in an aircraft's embedded processor which utilizes the wind speed to determine the desired landing speed is preferably parameterizable such that the value of the wind speed may be suitably inserted into any calculations which require wind speed.
In summary, Bird and Mudge unnecessarily allocates memory to hold the needed tables and does not allow for jumps, parameterization, or the like. Thus, Bird and Mudge do not disclose a desirable approach.
Another approach for reducing the size of the instruction set in embedded processors was proposed in 1997 by Darko Kirovski, Johnson Kin, and William H. Mangione-Smith in their paper, "Procedure Based Program Compression" (hereafter, "KKMS"). In the KKMS approach, the entire program is compressed. At run time, decompression of the entire program is 10 accomplished in real-time, i.e., each procedure is decompressed by the processor as needed. Each procedure is compressed as an entity (including jumps and arguments contained within a given procedure) and stored in a dedicated region of Random Access Memory (RAM). Inter-procedure calls and global references are stored in a software cache which is accessed via a Directory Service. As procedures are needed by the processor for a given operation, the Directory Service is consulted, and a linking tool is utilized to identify the location of the desired procedure and where to return after the procedure has been implemented. The procedure is then called into a pcache (i.e., a cache of volatile memory commonly provided on the processor chip; the pcache commonly holds frequently executed instructions), decompressed, and executed. Basically, this Directory Service approach utilizes a 10 step process to retrieve compressed procedures. This process is as follows:
1. A Source (which could be a previously run procedure, or the like) invokes the linking tool with a request for a desired procedure; PA1 2. If the desired procedure is already in the pcache (i.e., was previously called into the pcache and has not been subsequently overwritten in whole or in part) then skip to step 9; PA1 3. The target address of the desired procedure in the compressed memory is determined by consulting the Directory Service, which also provides the size of the compressed code; PA1 4. A determination is made as to whether the pcache has enough contiguous free space to hold the desired procedure after decompression, if so then go to step 8; PA1 5. A determination is made as to whether the pcache has enough fragmented space to hold the desired procedure, if so then go to step 7; PA1 6. Procedures are marked for deletion from the pcache until enough free space is available to hold the desired procedure; PA1 7. Fragmented space in the pcache is coalesced into a contiguous block; PA1 8. The desired procedure is decompressed and assigned to a location within the pcache; PA1 9. In the pcache, at the end of the decompressed procedure, a return identifier is placed which identifies the Source such that after execution of the desired decompressed procedure the processor knows where to resume its operations; and PA1 10. The desired procedure is executed.
As can be seen from the above procedure, the KKMS approach requires extensive processing time to identify, allocate pcache space, decompress, and execute a procedure. As a trade-off, the KKMS approach provides an extremely smaller instruction set which allows for a smaller RAM and thereby probably reduces power demands. However, as shown by the above 10 step process, the KKMS approach probably significantly slows down the processing speed of the program. In order to execute a procedure under the KKMS approach, a linking tool must be accessed, which then identifies the location of a procedure, determines whether a pcache has sufficient vacancies to hold the uncompressed procedure (if not the linking tool frees up space), calls the procedure, decompresses the procedure, identifies a return address, and then executes the procedure. Thus the KKMS approach is comparable to being a travel agent and trying to obtain rooms at a hotel for a major convention, kicking out those guests who are not as important as the convention goers (the least important guests are evicted first), relocating other guests to other rooms so that a contiguous wing of the hotel is reserved for the convention, telling the convention goers where they are staying, having the convention, and doing all the above at the exact moment the convention is desired to begin. One can truly appreciate the delays and inefficiencies of such an approach. Thus, the KKMS approach is not preferred because it is too slow.
Another approach for reducing the size of the instruction set in embedded processors was proposed by Charles Lefurgy, Peter Bird, I-Cheng Chen, and Trevor Mudge in their paper, "Improving Code Density Using Compression Techniques", copyright 1997, IEEE (hereafter, "LBCM"). In the LBCM approach, 8, 12, or 16 bit code words may be utilized instead of only 8 bit (one byte) code words. Thus, the LBCM approach utilizes the Bird and Mudge approach with the addition of a pseudo-variable length code word. The LBCM approach divides the code word into segments of nibbles (i.e., 4 bits), thereby allowing greater code compaction at the expense of somewhat slower procedure execution.
However, the LBCM approach suffers many of the deficiencies of the Bird and Mudge approach; namely, relative branches are not compressed, they only compress the instruction sequences within a basic instruction block, and they do not include arguments in their decompressions. Additionally, utilizing a 4-bit variable length code word presents unique hurdles in CPU processing. Normally, CPU instructions are aligned on 8 bit boundaries. Utilizing 4-bit boundaries may require the CPU to determine and modify the location within which an instruction starts. Thus, this approach imposes unnecessary addressing requirements upon the CPU which may decrease the CPU's processing speed and thereby limit the application's capabilities.
In "Code Generation and Optimization for Embedded Digital Signal Processors", Ph.D. Dissertation, Massachusetts Institute of Technology, 1996, Stan Yi-Huang Liao (hereafter "Liao") proposed two approaches for reducing the size of the instruction set in embedded processors; one approach without hardware assistance and one approach with hardware assistance. The approach without hardware assistance basically analyzes a program for common sequences of instructions. These common sequences are then entered and stored in a table (identified as a Dictionary in Liao). Each common sequence is appended with a return instruction such that when the common sequence is extracted from the Dictionary, they are replaced with a call. Thus, this non-hardware assisted approach basically utilizes a subroutine.
The hardware assisted approach also has a Dictionary (or a table entry) which is not appended with a return instruction. Instead, a hardware mechanism (for example, a counter) is told at the beginning of the instruction sequence how many instructions are contained in a specific procedure. The hardware mechanism then counts the number of instructions executed from the table and returns to the source program when the pre-identified number of instructions have been executed. Thus, the hardware assisted approach, when compared to the non-hardware assisted approach, reduces the code size by eliminating the return instruction.
However, the Liao approaches are not optimal. In the hardware assisted approach, only basic blocks are analyzed and compressed and the additional hardware is needed. Additionally, both Liao approaches do not allow arguments and/or parameters to be compressed.
Additional approaches at reducing the size of the instruction code in embedded processors were proposed by Michael Kozuch and Andrew Wolfe in 1994 in their paper, "Compression of Embedded System Programs", and by Andrew Wolfe and Alex Chanin in 1992 in their paper "Executing Compressed Programs on An Embedded RISC Architecture". Both of these approaches utilize a compression mechanism which requires decompression of program parts at run-time. Additionally, each contains an area in which decompressed program parts may be temporarily stored, and each utilizes procedures commonly known in the art to compress hard disk drive space or to send files over the Internet. Thus, these approaches, like KKMS, use memory space to identify where additional code sequences are located and to determine when to execute a jump. As a result, these approaches are often CPU intensive and undesirable in many embedded processors.
In summary, numerous approaches have been proposed which reduce the size of the program in embedded processors. All of these approaches, however, require either a trade-off in memory size and/or CPU processing speed (i.e., either larger memory is needed or a slower processing speed occurs). Additionally, many of the prior art approaches do not allow for jumps, parameters, or arguments to be compressed. Thus, a compression scheme which allows any program to be compressed without significantly decreasing the processing speed of the embedded processor is needed.