The invention is of a method for inserting a flexible number of useful instructions into delay slots and, in particular, for a method of automatically and dynamically determining the number of additional nop (no operation) instructions to be inserted into delay slots for the execution of a multi-cycle instruction, substantially without placing actual nop instructions into the program itself, yet while permitting useful instructions to be placed into the delay slots. The method of the present invention reduces the amount of space in memory required to store the program, as well as saving power consumption since the number of fetch operations is reduced. These benefits are achieved while still enabling the delay slots to be used.
Microprocessors were introduced around twenty-five years ago and have proliferated rapidly throughout many different types of technology. Advances of real-time microprocessor technology, especially in the communications industry, have boosted mass production of sophisticated devices such as cellular telephones, answering machines and audio systems. More efficient methods for production of these devices are continuously being sought, in order to increase the performance of the technology while reducing the costs of development and production.
Microprocessors execute machine code instructions which are derived from program code written by a human programmer or a code generator. Most of the instructions of current microprocessors are executed within a single clock cycle. Some instructions, however, require more than one clock cycle for execution and are termed multi-cycle instructions. Typical multi-cycle instructions include conditional branch instructions and other program-flow instructions. The clock cycles which elapse before a multi-cycle instruction takes effect are wasted. These cycle clocks are called delay-slots.
FIG. 1 illustrates an instruction sequence 10 of a background art program which demonstrates the time wasted by the insertion of three required empty cycles 12 for a multi-cycle instruction 14. Each empty cycle 12 does not result in the performance of a single-cycle instruction 16, but is inserted only to permit multi-cycle instruction 14 to be performed. Clearly, instruction sequence 10 represents a relatively inefficient method for enabling multi-cycle instruction 14 to be performed.
Pipeline architecture for microprocessors was developed to execute more instructions in parallel for greater efficiency. Pipeline microprocessors are capable of running a few instructions simultaneously, such that the microprocessor is not idle during the vacant time-slots. Other instruction can be inserted into the delay-slots by the human programmer in order to use the previously wasted time required for execution of the multi-cycle instruction. The microprocessors fetches these delay-slot instructions individually during the delay-slots, loads these instructions into the pipe and then executes the instructions simultaneously. These inserted instructions are usually not related to the multi-cycle instruction for which the delay-slots were originally generated. Instead, these instructions perform other tasks such as control duties, registration of loading for the following instructions and so forth.
FIG. 2 illustrates the program of FIG. 1, rewritten to be performed by a microprocessor having a pipelined architecture according to the background art. Now an instruction sequence 18 of the program features three instructions 20 to be executed in cycles 4,5 and 6 during delay-slots for multi-cycle instruction 14. Thus, the program of FIG. 2 is executed more efficiently than that of FIG. 1.
However, currently the pipelined microprocessor architecture has a number of disadvantages. For example, both time and memory space which holds the program are wasted by multi-cycle instructions if useful instructions are not inserted into the delay slots. If only some, but not all, delay-slots are filled by useful instructions, the unused delay-slots currently must be loaded with nop (no operation) instructions. Nop instructions require memory space but do not perform any useful function. Since such nop instructions are frequently required, the delay-slot problem is merely reduced but not solved. Thus, there is a tradeoff between the requirement for additional memory space and the amount of time which is wasted.
In addition to the problems of wasted time and memory space, the requirements of the programmer must also be considered. The programmer should fill as many delay-slots as possible with useful instructions, in order to optimize performance, but finding useful instructions is rarely simple. The process of inserting useful instructions into all of the delay-slots is time consuming, difficult to document and difficult to maintain. Programmers spend a good deal of time seeking useful instructions to place in the delay-slots. Furthermore, higher-language compilers such as C compilers must also attempt to fill delay slots with useful instructions. Even with an optimization algorithm, such compilers often cannot use all delay slots, thereby wasting additional space required to store the program.
Program-flow instructions are an example of such multi-cycle instructions and occur, on average, at a rate of 1 program-flow instruction for every 18 single-cycle instructions in typical communication applications. This rate can be used to calculate the expected amount of wasted memory, knowing that an average program-flow instruction generates three required nop instructions, according to Equation 1 below: ##EQU1##
Equation 1 shows that a program which is 18 Kb in size, for example, is wasting 3 Kb of memory, not including memory wasted by the other types of multiple-cycle instructions. Such memory wastage reduces the efficiency of operation of the associated device, as well as increasing costs of production. Thus, multi-cycle instructions cause a three-way problem, including wasting time and program memory, and increasing the amount of time required by a programmer in an attempt to use the delay slots for instructions.
The relatively high rate at which multi-cycle instructions occur has highlighted the shortcomings of the pipelined microprocessor architecture. Currently, two different methods for handling multi-cycle instructions are available in the background art. The first method involves not using any delay-slots, thereby wasting time but saving memory, since the microprocessors can run nop instructions without having them explicitly included in the program. The second method requires all of the delay-slots to be used by inserting either an actual instruction or a nop instruction. The memory space required is increased if nop instructions are inserted. However, if the actual instructions are efficiently inserted, the amount of time required to execute the program is reduced.
The first background art method is selected when high performance of the execution of the program (with regard to time) is not required. The performance of the execution of the program, and hence the amount of time required for the program to be executed, is traded for memory economy and ease of programming by the programmer. The second background art method is designed for high performance applications which must be executed efficiently. Programmers who are interested in rapid, efficient execution of a program must therefore insert useful instructions into all the delay-slots. This is a tedious task which may gain performance efficiency and therefore reduce the amount of time required to execute the program if all delay slots are used, but causes a penalty of an expanded amount of memory required to store the program if not all delay slots are usefully used. A failure to use all of the delay-slots causes nop instructions to be inserted into all of the remaining unused delay-slots. Thus, neither background art method for handling multi-cycle instructions provides all the three advantages: speed, economical and efficient use of memory, and ease of programming.
There is thus a need for, and it would be useful to have, a method for more efficiently handling delay slots by having only useful instructions explicitly inserted into delay slots by the human programmer or higher language compiler, such that implicit nop instructions are inserted to complete the number of delay slots remaining in a substantially automatic process during operation of the microprocessor and such that memory space associated with the microprocessor is used more efficiently, while improving performance and reducing development time and costs, and while providing an optimal balance between the requirement for additional memory space to hold such useful instructions and the amount of time which is wasted during program execution by such implicit nop instructions.