The present invention is concerned with a system, apparatus and method for processing instructions in a data processing system, and in particular with the provision of instructions to the processor unit. The invention is concerned with the processing of instructions especially in embedded and real-time systems where performance is a concern.
In conventional computer systems, instructions are stored in main storage and fetched from there by a memory management system for execution by a central processor unit, or possibly by some special function unit, such as a floating-point processor. In some systems, some instructions may be retained after their use in a cache memory which can be accessed more quickly than the main storage, so that such instructions can be reused later in the execution of the same program. This improves the execution performance of the computer system by reducing the time taken to fetch the instructions for processing by the central processing unit.
In systems having caching, the number of cycles taken to retrieve an instruction depends on whether it is already in the cache or not. If it is not (a xe2x80x9ccache missxe2x80x9d), the instruction must be fetched from main memory, and this can leave the processor xe2x80x9cstalledxe2x80x9d for one or more processor cycles, thus returning the processing performance of the system to the same level as it would have been without the cache.
A particular instance of this loss of performance is in the case of a branch or jump instruction to an address which is not in the cache. There are three possible branch or jump cases. In the first, a backward jump within a function, the addressed instructions are quite likely to be in the cache, so performance is typically not badly affected. In the second, a forward jump within a function, programming means are available to arrange the instructions within the function to optimize performance by, for example, placing the fastest case first; in this case, performance is typically at the discretion of the programmer. In the third case, a jump to a new function, there can be a considerable performance cost when the address to be jumped to is not cached. This is so even in systems where the caching subsystem only stalls the processor until the first word of a cache line has been fetched, rather than the whole cache line; the delay can still be considerable in comparison with the instruction execution rate of a modern processor.
The performance problem outlined above is exacerbated by the increasing tendency to develop code using compiled high-level languages. Formerly, much code was developed in low-level languages close to the level of the machine instructions of the processor. This gave programmers the opportunity to trim their code strictly to reduce the numbers of instructions to a bare minimum. However, such development methods required the luxury of considerable programmer time, and so there developed the tendency to use compiled high-level languages, in which a single programmer instruction can represent many machine instructions. This tendency is to be found even for code that is performance-sensitive, such as code for embedded processors: device controllers, real-time processors, and the like. Such generated code often does not contain large amounts of iteration, can be very large, and can contain many jumps between procedures or functions. It can also contain redundant instructions, such as data type tests for data types that might never be encountered during any real execution of the program. In such code, there can be significant performance costs, not all of which can be adequately handled by typical automated code optimization. In a modern embedded processor, which is very fast, the memory subsystem typically uses caching to reduce the amount of time the processor must wait for instructions and data. However, as described above, caching does not solve the problem of the jump to an uncached address.
One way of dealing with this particular problem is to look ahead in the code and prefetch one or more instructions from the jumped-to code, but this involves the use of extra instruction cycles to look ahead and fetch the jumped-to instructions into the cache. It also increases the program""s consumption of memory bandwidth. The cached instructions also take up space in the cache, which is of limited size. In some cases, this use of resources will all be wasted as the jump might depend on some condition that is not encountered, and thus might not be taken during this particular execution of the program. Cache size presents a problem particularly in embedded systems, and it is the combination of the cache size problem with the problem of processor stalls that forms the background to the present invention.
Accordingly, in a first aspect, the present invention provides a data processing system, comprising means for identifying one or more jump instructions to jump to functions having known prolog instructions, means, responsive to said means for identifying, for replacing said jump instructions with one or more modified jump instructions, means for storing said known prolog instructions, means responsive to said modified jump instructions for retrieving said known prolog instructions from said means for storing, and means for supplying said known prolog instructions for processing.
A data processing system as described is advantageous in embedded systems, such as device controllers, but is also advantageous in any general purpose data processing system, where the logic module may be either a hardware logic module or a software module.
Preferably a data processing system as described further comprises means for identifying known epilog instructions, means, responsive to said means for identifying, for replacing a first instruction of said known epilog instructions with one or more modified instructions, means for storing said known epilog instructions, means responsive to said modified instructions for retrieving said known epilog instructions from said means for storing, and means for supplying said known epilog instructions for processing.
Preferably in such a data processing system, said means for identifying and said means for replacing form part of a compiler or of a preprocessor.
Preferably also, in such a data processing system, said means for storing, said means for retrieving and said means for supplying form part of a logic module, which may be a hardware logic module.
In a second aspect, the present invention provides apparatus for supplying instructions to a processor unit, comprising means for identifying one or more jump instructions to jump to functions having known prolog instructions, means for storing said known prolog instructions, means (responsive to said means for identifying) for retrieving said known prolog instructions from said means for storing, and means for supplying said known prolog instructions for processing by said processor unit.
Preferably, the apparatus as described further comprises means for identifying one or more instructions to execute known epilog instructions, means for storing said known epilog instructions, means, responsive to said means for identifying, for retrieving said known epilog instructions from said means for storing, and means for supplying said known epilog instructions for processing by said processor unit.
Preferably said apparatus is a logic module, which may be a programmable logic module.
In a third aspect, the present invention provides a method for supplying instructions to a processor unit, comprising the steps of identifying one or more first instructions to jump to functions having known prolog instructions, storing, by a logic module, said known prolog instructions, replacing said first instructions with one or more modified jump instructions, identifying, by said logic module, said modified jump instructions, retrieving, by said logic module, said known prolog instructions, and supplying said known prolog instructions by said logic module to said processor unit.
Preferably, the method as described further comprises the steps of identifying one or more second instructions to execute known epilog instructions, storing, by a logic module, said known epilog instructions,replacing said second instructions with one or more modified instructions, identifying, by said logic module, said modified instructions, retrieving, by said logic module, said known epilog instructions, and supplying said known epilog instructions by said logic module to said processor unit.
Preferably, the method of the third aspect of the present invention is further characterised in that the steps of identifying and replacing said first or said second instructions are carried out by a compiler or a preprocessor.
The present invention advantageously exploits the fact that compiler generated prolog code for many functions forms standard blocks with limited variations. In most cases, for example, the prolog code sets up a local environment, a xe2x80x9cstack framexe2x80x9d, a set of base registers, or in some way establishes the addressability of some local storage elements. Thus, it is possible to analyze and extract the patterns, which may then be stored in a separate logic module ready for immediate use, so that the processor unit is not stalled waiting for these standard groups of instructions. Typically, by the time these instructions have been executed, the next instructions will have been fetched from the main memory of the system, ready to continue the normal processing of the main body of the function or procedure. In order to cause the logic module to intercept the jump instruction and supply the known prolog instructions, a compiler or preprocessor is used to modify the jump instruction from its normal form of xe2x80x9cjump to Xxe2x80x9d to the form of xe2x80x9cjump to X with prolog Pxe2x80x9d. The logic module is arranged to recognize this modified instruction and to respond by avoiding the normal fetch operation for the prolog instructions, thus causing the next fetch to come into effect for the remainder (or the first part of the remainder) of the instructions. It also supplies the known prolog instructions directly to the processor unit for processing.
In systems where cache size and use is a limiting factor, the present invention can be advantageously extended to reduce cache-dependence for epilog code by applying the same technique: that is, the start of the epilog code can be signalled to the logic module, which can then supply instructions directly from its own internal storage to the processor, thus reducing the need for storage space in the cache.