The present invention pertains to service installation by modifying a base function image to route a call of the base function to a service function. The service function adds a service semantic and can call a pass function. The pass function enables a service-free semantic for the base function.
Complex software systems from applications to operating systems include thousands, even millions, of lines of code. Understanding these complex software systems presents a challenge to the original software developers as well as third party software developers and users. The ability to instrument and extend these software systems promotes efficient software development, effective software use, and innovative software research.
Original developers build, debug, and optimize a system. When building and testing a system, the original developers frequently install a service on the system for instrumenting the system. Instrumentation profiles the interaction of various components of the system, times the execution of components, or otherwise measures the system for revision or optimization. Instrumentation can help isolate problems during debugging. After a first generation, developers may extend a software system to include new functions. Still later, when a complex software system has undergone revisions, developers may face legacy problems, having to support earlier versions of the complex system while providing enhanced functionality in later versions.
After a complex software system ships, third party developers and users may not have access to source code for the system. Nevertheless, like the original developers, the third party developers and users may want to instrument the software system for profiling, timing, optimizing, or debugging. Moreover, third party developers and users frequently want a slightly different software system. In this situation, rather than create a whole system from scratch, third party developers and users may prefer to change or extend an existing software system. Given the complexity of such systems and the lack of access to source code, however, third party developers and users must work with a binary version.
Most current software systems use function calls between components. A function performs a task or operation called a semantic. A function call is a request to a component to perform the semantic for a function. A function call typically transfers execution to the called function while saving the necessary information to allow execution to resume at the calling point when the called function has completed execution. One effective way to instrument or extend software systems involves interception of function calls. Techniques for intercepting function calls include source code replacement, binary code replacement, dynamic link library redirection, dynamic link library replacement, breakpoint trapping of function calls, and inline redirection.
Source code replacement involves replacing function calls in source code with calls to, e.g., instrumentation functions. This requires access to source code, which renders it impracticable for many software systems. Binary code replacement entails replacing function calls in a binary of the software system with calls to, e.g., instrumentation functions. While this does not require source code access, it requires the ability to identify all applicable call sites. To facilitate identification of call sites, an application might be linked with substantial symbolic information.
When a software system uses load-time dynamic linking, DLL redirection involves modifying an import table in a binary file to reference an instrumentation library. DLL redirection fails to intercept dynamic function calls, however. DLL replacement involves replacing a DLL with an instrumented version. While this guarantees an instrumented semantic for the library, it penalizes use of the non-instrumented semantic for the library.
Breakpoint trapping involves insertion of breakpoints into an image after it has been loaded into memory space. When execution reaches a breakpoint, an exception is thrown and caught by the instrumentation system. While effective, breakpoint trapping has a very high performance cost.
Inline redirection involves intercepting function calls and rerouting them to instrumentation. Inline redirection is potentially effective and efficient, but the various existing implementations of inline redirection have numerous shortcomings.
Inline redirection falls into the family of techniques known as code patching. Code patching has been used both to instrument and to extend the functionality of software systems. To intercept execution, an unconditional branch, or jump, is inserted into the desired interception point in a base function. Code overwritten by the unconditional branch is moved to a code patch. The code patch includes a call to instrumentation code (or the instrumentation code itself), the moved instructions, and a jump to the first instruction in the base function following the unconditional jump. A code patch can be inserted at the beginning, middle, or end of a base function, but works in a relatively fixed mannerxe2x80x94it executes then transfers execution to the base function. A code patch lacks flexibility when working with the semantic for the base function. For example, a code patch does not preserve the semantic for a base function as a sub-routine and does not facilitate invoking the semantic for the base function an arbitrary number of times. Moreover, integration of a code patch with a base function is potentially very complicated. The code patch must ensure consistency in the context (registers, stack pointer, etc.) of the base function before and after instrumentation. A code patch typically saves register values and a stack pointer using hardware specific functions. To simplify state management, code patches are typically only prepended to base functions.
Static binary rewriting tools take as input a software system binary and an instrumentation script. The instrumentation script passes over the software binary inserting code between instructions, basic blocks, or functions. The output of the script is a new, instrumented software binary. This instrumented binary is relatively static. Instrumentation cannot be conveniently applied to an image at any point in execution. Moreover, while static binary rewriters allow insertion of instrumentation around instructions, e.g., through free registry discovery, the task of maintaining state consistency becomes very complicated. Static binary rewriters can use a standard system utility to save and restore states. Like code patching techniques, however, static binary rewriters do not preserve the semantic for a base function as a sub-routine and do not facilitate invocation of the semantic for the base function an arbitrary number of times.
The present invention pertains to service installation for modifying a base function to introduce therein an additional service provided by a service function. A function semantic relates to the task or operation that the function performs. The base function provides a base function semantic and the service function provides a service function semantic, neither of which is specified by the present invention. A pass function bypasses any installed service as necessary to provide a service-free base function semantic. After installing a service on a base function, calling the base function provides a service-installed semantic for the base function. On the other hand, calling the pass function provides a service-free semantic for the base function.
Service installation according to the present invention creates little overhead, correctly intercepts both statically and dynamically bound invocations, and is flexible. Using techniques of the present invention, inline redirection of any function can be selectively enabled for each process individually at load time based on the needs of the instrumentation.
The present invention pertains to the base, service, and pass functions, a library of functions for attaching a service installation section to a binary file, a library of functions for installing and removing services, techniques for creating a pass function, service state management techniques that exploit a uniform calling convention, and various techniques for using a pass function, as well as various applications of the above techniques, functions, and library.
The instructions for a function in computer memory form a function image. According to one aspect of the present invention, computer memory stores data representing a base function image, a service function image, and a pass function image. The base function image comprises instructions that provide a base function semantic. The service function image comprises instructions that provide a service function semantic. Independent from service installation, the pass function image provides a service-free semantic for the base function.
The service function provides some instrumentation, redirection, or extension for the base function semantic. The service function can provide a layer for profiling parameters of base function calls, a layer for redirecting base function calls, a layer for timing execution of the base function, a layer for redirecting exceptions, or a layer for instrumenting or extending the base function in some other way. The service function typically includes at least one call to a pass function. A call to the pass function can occur before and/or after execution of any other instructions in the service function. Alternatively, the service function can conditionally bypass the pass function.
The pass function provides a service-free semantic for the base function with an unconditional branch to an instruction in the base function image. For example, before service installation, a pass function image includes an unconditional branch instruction to the beginning of the base function image. After service installation, in which an unconditional branch instruction replaces one or more instructions at the beginning of the base function image, the pass function image includes any replaced instructions and an unconditional branch to the instruction that logically follows the replaced instructions in the base function. The pass function can be allocated statically (prior to run time) or dynamically (at run time). In one embodiment of the present invention, the pass function is callable from a user module.
According to a second aspect of the present invention, a service installation system includes a library of functions for installing a service on a base function. The service installation system includes a construct function and an install function. The service installation system works on functions with fixed or variable length instructions.
The construct function includes instructions for creating a pass function. In one embodiment, the construct function is a macro used to statically allocate the pass function. To create a statically allocated pass function, the construct function accepts as parameters a pass function prototype (the name of the pass function) and the name of the base function. In an alternative embodiment, a statically allocated pass function is created by explicitly allocating an array of instructions for a pass function.
The install function includes instructions for replacing one or more instructions in the base function with an unconditional branch instruction to a service function. The install function also includes instructions for making a pass function include the replaced base function instructions followed by a jump to the logically subsequent instruction in the base function. Thus, the install function gives a base function a service-installed semantic and gives a pass function a service-free base function semantic. According to one embodiment, the install function makes a pass function conform to a service-installed base function by enumerating the instructions of the base function and copying the first one or more instructions to the pass function before installing the unconditional branch in the base function. In the pass function, an unconditional branch to the logically subsequent base function instruction follows the last copied instruction.
When instructions come from a fixed length instruction set, the first instruction of the pass function becomes the first instruction of the original base function. The unconditional branch instruction in the pass function transfers control to the second instruction in the modified base function (whose first instruction is an unconditional branch to the service function). On the other hand, when instructions come from a variable length instruction set, the install function determines the size of the unconditional branch instruction to the service function. One or more instructions from the beginning of the base function, including instructions that will be overwritten by the unconditional branch instruction to the service function, are copied to the pass function. The unconditional branch instruction to the base function from the pass function transfers control to the instruction following the last copied instruction of the base function.
When a pass function created by the construct function is available, the install function accepts as parameters references to the pass function and the service function. In another embodiment, the install function takes references to the base function and the service function and returns a reference to a dynamically allocated pass function. In addition to the construct and install functions, the service software library can include functions for locating references to base functions and removing service installations.
According to a third aspect of the present invention, to attach a service software library to a software system, an application binary for the software system is modified to include a service installation section. An import table in the service installation section references the service software library. At link time, the functions of the service software library are made available to the software system. In addition to the construct and install functions, the service software library can include functions for editing an import table or attaching and removing data payloads to a service installation section.
A fourth aspect of the present invention concerns statically allocated pass functions. Using a link symbol for a base function, a statically allocated pass function can be created to guarantee a service-free semantic for the base function. Given a pass function prototype and the name of a base function, a macro statically allocates a pass function. In an alternative embodiment, a statically allocated pass function is created by explicitly allocating an array of instructions for a pass function. The statically allocated pass function initially includes an unconditional branch instruction to the beginning of the base function. The statically allocated pass function can be called from a service function and/or a user module. While the instructions of the statically allocated pass function change when service installation occurs, a statically allocated pass function is callable by a user module to provide a service-free base function semantic before and/or after service installation. Similarly, a statically allocated pass function is callable to provide a service-free base function semantic before and/or after service removal.
A fifth aspect of the present invention concerns dynamically allocated pass functions. A dynamically allocated pass function is created at run time during service installation on a base function. As an install function installs a service, the install function dynamically allocates a pass function to provide a service-free base function semantic. The install function takes as parameters, for example, references to a base function and a service function. The install function returns a reference to the dynamically allocated pass function. Where necessary, a pointer locating function finds a reference to the base function.
A sixth aspect of the present invention concerns using a standard calling convention to maintain state (e.g., register values, stack pointer, etc.) consistency for a service system. A base function, pass function, and service function have the same call signature. A start function calls a service-installed base function. The call pushes a call frame on the call stack. The base function transfers execution to a service function without pushing a new call frame. The service function calls a pass function at least one time before and/or after executing other instructions. On a call to the pass function, a new call frame is pushed on the call stack. The pass function transfers execution to the base function. When the base function completes execution, the call frame from the call to the pass function pops from the call stack. Later, when the service function completes execution, the call frame from the original call to the base function pops from the call stack. In this way, the service uses the standard calling convention to maintain state consistency without the need for complex register-usage analysis common to existing binary rewriters.
A seventh aspect of the present invention concerns calling the pass function as a sub-routine to provide a service-free base function semantic. For example, a service function typically makes at least one call to the pass function before and/or after executing any other instructions. The pass function provides a service-free base function semantic as a sub-routine of the service function. Alternatively, the service function conditionally bypasses the pass function.