1. Field of the Invention
This invention involves methods for improving CPU performance by removing the overhead associated with accessing of alternate algorithms in a computer program.
2. Description of Related Art
In certain performance critical regions of code, there is the need to dynamically switch between different implementations of a function with minimal overhead. An example of this would be an operating system or a disk driver selecting different encryption algorithms for a disk file system. The implementations of these different encryption algorithms would typically be located in functions of modular code like a DLL (Dynamically Linked Library). The operating system or disk driver would determine or call some other code to determine which algorithm it needs. It then executes the implementation for this algorithm by loading the module and/or calling the function. This process in the example is time consuming because of the two main factors that incur overhead.
The first factor is the calls into the functions of modular code. The locations of the functions in modular code are fixed and the application needs to resolve the addresses of these functions upon loading/linking the module. This process of loading and linking the module involves redirecting the application to an import/export table, a stub of code that knows the location of the functions, or some other means of resolving the address. In addition, these redirections introduce possible memory stalls, cache misses, and other factors that add latency. Due to these levels of indirection, making calls into functions that reside in modular code has to go through paths that add overhead to the call.
The second factor that incurs overhead is the dynamic switching to the appropriate implementation of the algorithm. Execution of the switching involves conditional branches that yield some uncertainty about the flow of instructions. It is possible that these uncertainties will cause the processor to make branch mispredictions and execute instructions that add latency. All of these factors that contribute to overhead add up and may lead to an unacceptable performance of the application.
It is therefore important to solve the problem of overhead when switching between different implementations. Once an implementation is selected and it is determined that this implementation will always be used by the application from that point on, then we no longer need the ability to dynamically switch between different implementations and an optimization can be made to directly call the implementation without going through the overhead of the dynamic switching process. This invention accomplishes that by providing a method of placing a stub of code at the entry-point of the time critical function. This stub will determine which implementation of the function is to be called in the current environment. The stub will then patch the application so that the application will directly call into this function without the usual overhead.
One related art method to which the method of the present invention generally relates is described in U.S. Pat. No. 5,121,003, entitled “Zero Overhead Self-Timed Iterative Logic”. This related art method is a method which uses a third phase to store data, which allows domino logic gates to be cascaded and pipelined without intervening latches. The inputs to this system must have strictly monotonic transitions during the logic evaluation phase and the precharge signal must be active during only the precharge phase. Furthermore, the pipelined system can feed its output back to the input to form an iterative structure. Such a feedback pipeline is viewed as a “loop” or “ring” of logic which circulates data until the entire computation is complete.
The present invention differs from the above prior cited art in that the prior invention, appears to be a hardware design technique for use in self-timed (as distinct from clocked) logic. As such, the prior cited art does not solve the issue of reducing the overhead incurred when linking two software modules together dynamically. The method of the present invention solves the problem of reducing overhead incurred in dynamically linked software modules, whereas the prior cited art, does not.
Another related art method to which the method of the present invention generally relates is described in U.S. Pat. No. 5,513,132, entitled “Zero Latency Overhead Self-timed Iterative Logic Structure And Method”. This related art method is a method where a novel third phase of CMOS domino logic is identified and used in the logic system of the invention to store data. The use of this third phase in addition to the normally used precharge and logic evaluation phases, provides a logic structure of cascaded domino logic gates which are pipelined without intervening latches for memory storage. The memory storage function of the conventional latches are provided by the third logic phase. The novel approach requires that the functional inputs to this system have strictly monotonic transitions during the logic evaluation phase, and requires that the precharge signal must be active during only the precharge phase. Embodiments of the pipelined system according to the invention, are structured so that the output of the pipeline are fed back to the input of the pipeline to form an iterative structure. Such a feedback pipeline is viewed as a “loop” or “ring” of logic. The logic ring circulates data until the entire computation is complete. A method for using the logic structure is also described.
The present invention differs from the above prior cited art in that the cited prior invention appears to be a hardware design technique for use in self-timed linkage (as distinct from clocked). As such, this prior cited art also does not solve the issue of reducing the overhead incurred when linking two software modules together dynamically. The prior cited art is a hardware (H/W) technique for designing asynchronous logic, whereas the method of the present invention is a software technique to reduce overhead.