The present invention relates generally to load-time dynamic linking of a library to a compiled application in a reversible process, then load-time dynamic linking of another library to the compiled application. For example, an instrumentation library of an automatic distributed partitioning system is linked to an application for profiling the application, then a second instrumentation library is linked to an application for distributing the application in a distributed computing environment.
Fueled by the growing importance of the Internet, interest in the area of distributed systems (two or more computers connected by a communications medium) has increased in recent years. Programmers desiring to take advantage of distributed systems modify existing application programs to perform on distributed systems, or design applications for placement on distributed systems.
A distributed application is an application containing interconnected application units (xe2x80x9cunitsxe2x80x9d) that are placed on more than one computer in a distributed system. By placing units on more than one computer in a distributed system, a distributed application can exploit the capabilities of the distributed system to share information and resources, and to increase application reliability and system extensibility. Further, a distributed application can efficiently utilize the varying resources of the computers in a distributed system.
Various types of modular software, including software designed in an object-oriented framework, can conceivably be distributed throughout a distributed system. Object-oriented programming models, such as the Microsoft Component Object Model (xe2x80x9cCOMxe2x80x9d), define a standard structure of software objects that can be interconnected and collectively assembled into an application (which, being assembled from component objects, is herein referred to as a xe2x80x9ccomponent applicationxe2x80x9d). The objects are hosted in an execution environment created by system services, such as the object execution environments provided by COM. This system exposes services for use by component application objects in the form of application programming interfaces (xe2x80x9cAPIsxe2x80x9d), system-provided objects and system-defined object interfaces. Distributed object systems such as Microsoft Corporation""s Distributed Component Object Model (DCOM) and the Object Management Group""s Common Object Request Broker Architecture (CORBA) provide system services that support execution of distributed applications.
In accordance with object-oriented programming principles, the component application is a collection of object classes which each model real world or abstract items by combining data to represent the item""s properties with functions to represent the item""s functionality. More specifically,, an object is an instance of a programmer-defined type referred to as a class, which exhibits the characteristics of data encapsulation, polymorphism and inheritance. Data encapsulation refers to the combining of data (also referred to as properties of an object) with methods that operate on the data (also referred to as member functions of an object) into a unitary software component (i.e., the object), such that the object hides its internal composition, structure and operation and exposes its functionality to client programs that utilize the object only through one or more interfaces. An interface of the object is a group of semantically related member functions of the object. In other words, the client programs do not access the object""s data directly, but instead call functions on the object""s interfaces to operate on the data. Polymorphism refers to the ability to view (i.e., interact with) two similar objects through a common interface, thereby eliminating the need to differentiate between two objects. Inheritance refers to the derivation of different classes of objects from a base class, where the derived classes inherit the properties and characteristics of the base class.
An application containing easily identifiable and separable units is more easily distributed throughout a distributed system. One way to identify separable units is to describe such units with structural metadata about the units. Metadata is data that describes other data. In this context, structural metadata is data describing the structure of application units. Further, application units are desirably location-transparent for in-process, cross-process, and cross-computer communications. In other words, it is desirable for communications between application units to abstract away location of application units. This flexibly enables the distribution of application units.
The partitioning and distribution of applications are problematic and complicated by many factors.
To partition an application for distribution, a programmer typically determines a plan for distributing units of the application based on past experience, intuition, or data gathered from a prototype application. The application""s design is then tailored to the selected distribution plan. Even if the programmer selects a distribution plan that is optimal for a particular computer network, the present-day distribution plan might be rendered obsolete by changes in network topology. Moreover, assumptions used in choosing the distribution plan might later prove to be incorrect, resulting in an application poorly matched to its intended environment.
Generally, to distribute an application, one can work externally or internally relative to the application. External distribution mechanisms work without any modification of the application and include network file systems and remote windowing systems on a distributed system. Although external distribution mechanisms are easy to use and flexible, they often engender burdensome transfers of data between nodes of the distributed system, and for this reason are far from optimal. Internal distribution mechanisms typically modify the application to be distributed in various ways. Internal distribution mechanisms allow optimized application-specific distribution, but frequently entail an inordinate amount of extra programmer effort to find an improved distribution and modify the application. Further, internal Systems frequently provide ad hoc, one-time results that are tied to the performance of a particular network at a particular time.
An automatic distributed partitioning system (ADPS) works internally relative to an application to partition application units, and works automatically or semi-automatically to save programmer effort in designing distributed applications.
In the 1970""s, researchers postulated that the best way to create a distributed application was to use a compiler in a run time environment to partition the application, and to provide the exact same code base to each of plural distributed machines as used on a single machine to execute the distributed application. After analyzing the structure of procedures and parameters in the source code of an application, metadata describing the structure of an application were generated from the application source code. Using this metadata, these ADPs profiled the application and generated a communication model for the application. A compiler was again used to generate from application source code a final application for distribution. The Interconnected Processor System (ICOPS) is an example of an ADPS designed in the 1970""s. The Configurable Applications for Graphics Employing Satellites (CAGES) also supported creation of distributed applications, but required re-compilation of application source code to generate a version of an application for distribution. A more recent example of an ADPS is the Intelligent Dynamic Application Partitioning (IDAP) System. ICOPS, CAGES, and IDAP suffer from numerous drawbacks relating to the universality, efficiency, and automation of these systems.
ICOPS, CAGES, and IDAP require time-consuming compilation of application source code to generate an instrumented version of an application. To generate versions for profiling an application and distributing the application, two compilations may be required. No ADPS provides a mechanism for quickly and flexibly-generating instrumented applications from ordinary applications. More specifically, none provides a flexible mechanism for dynamically linking different instrumentation packages to an application.
For a number of reasons, including flexibility and modularity, a software application typically contains references to functions held in external libraries. When the application is compiled, the external references are compiled into the executable version of the application. In order for the application-to run, the external references in the application must be resolved through a process of linking the external references to function code held in an external library. Numerous techniques for linking are known in the art.
To illustrate, suppose a compiled application contains references to functions in an external library. The external library is compiled. A header file describes the contents of the library. As is known in the art, using static linking, code for the appropriate functions in the external library is inserted into the compiled application to resolve the external references. One disadvantage of this system is redundant storage of code where multiple applications reference a particular function. Another disadvantage is inability to adapt easily to changes in library functions.
As is known in the art, using dynamic linking, a compiled application with references to functions in an external library maintains pointers to the functions in the external library. For example, in the Microsoft Windows(copyright) operating system, a compiled application can have references to functions in a dynamic link library (DLL) that contains compiled function code for dynamic linking. In this way, multiple references to the function do not require multiple copies of the function code in memory, and changes within library functions do not mandate re-linking of the compiled application. Techniques for dynamic linking include run-time dynamic linking, in which external references are fully resolved at run-time, and load-time dynamic linking, in which external references are resolved to an intermediate level such as an import table at link time, and are fully resolved at load-time. Load-time dynamic linking is alternatively called static binding or static linking to an import table.
As is known in the art, run-time dynamic linking can be accomplished by loading a library and a function into memory at run-time, then executing the function. For example, in the Microsoft windows(copyright) operating system, the LoadLibrary and GetProcAddress functions load a library and retrieve a function address within the library, respectively. With this information, a function can be invoked. To use run-time dynamic linking without modifying an application binary, code fragments containing loading and invoking instructions can be forcefully injected into the address space of an application through a technique such as DLL injection. The injected code fragments can be invoked through one of several techniques known in the art. One of the disadvantages of doing this is that a special loader is needed to inject the code into the application binary. This special loader adds complexity to the linking operation, and causes unnecessary overhead during execution if it cannot be detached.
Using load-time dynamic linking, a compiled application is linked to an intermediate level that contains references to a dynamic link library. At load time, the compiled application and libraries listed in the intermediate level are loaded into the address space for the application. For example, in the Microsoft Windows(copyright) operating system, an import table includes a list of dynamic link libraries and, for each dynamic link library, a list of functions. At link time, the references to external functions in a compiled application file are linked to the import table. If the application calls the Windows(copyright) MessageBox function, this reference is replaced with the name of the library containing that function, xe2x80x9cUser.dll,xe2x80x9d and an ordinal number representing the location of the MessageBox function within. the library. At load time, the Windows(copyright) operating system replaces the xe2x80x9clibrary.ordinalxe2x80x9d references with addresses that are valid for use in function calls. Although load-time dynamic linking is simple and fast, the intermediate level that references the external libraries typically has a fixed structure, limiting the flexibility of the system.
An entry-point function is an optional function of a library that the operating system calls to perform operations defined by the function. An entry-point function is called at times specified by an operating system, for example, when the library is loaded into or unloaded from an application""s address space, or when the library attaches to or detaches from a thread. In the Microsoft Windows(copyright) operating system, an entry-point function can be specified for a particular dynamic link library prior to linking.
The present invention pertains to linking a selected library to a compiled application using variations of load-time dynamic linking. At some time prior to linking, one or more libraries are selected for linking to a compiled application. An association is made between the selected libraries and any external libraries referenced within the compiled application. At link time, the selected libraries and the external libraries link to the compiled application. In this way, one or more selected libraries with names of arbitrary length link to a compiled application along with the external libraries. At load time, the application, selected library, and any external libraries load.
According to one aspect of the invention, a list includes references to any libraries to be linked to the compiled application. While the list initially includes references to any external libraries, a reference to a selected library is added to the list. In an illustrated embodiment, a compiled application stored in Common Object File Format (COFF) includes a data section of executable code and an import table. This import table lists references to any external libraries having functions referenced within the executable code of the data section. A new import table is created which includes a reference to the selected library as well as the original import table.
In one embodiment of the present invention, a pointer references the list of libraries. Originally, the pointer references the list of external libraries. When the reference to the selected library is added to the list of external libraries, the pointer references the modified list. By archiving the state of the pointer before adding the selected library to the list, the original state of the list can be restored at a later time. In this way, the process of linking the application to the selected library is made reversible. The application can be re-linked to add or remove a selected library. Alternatively, the selected library changes without re-linking the application by overwriting an entry of the modified list to include a reference to a second selected library-instead of a reference to the original selected library. In the illustrated embodiment, an application stored in COFF format includes a COFF header. The COFF header includes a pointer to the import table. Before creating a new import table, the state of the COFF header pointer is stored, for example in a special structure designed to archive the pointer. After the new import table is created, the COFF header pointer references the new import table instead of the old import table. The libraries referenced by the COFF header pointer link to the application and load. The GOFF header pointer can be restored to its original state from the archived pointer, and the application re-linked without the selected library. Alternatively, the entry in the new import table referencing the selected library can be overwritten with a binary rewriter to reference a second selected library.
According to another aspect of the present invention, a data record stores data accessible through function of the selected library. For example, when the selected library pertains to instrumentation for an automatic distributed partitioning system, the data record stores information related to profiling the application or configuring the application during distribution. Alternatively, the data record stores a list of additional libraries to be loaded by a function of the selected library. The data record allows arbitrary appended data to be accessed by the selected library to direct the selected library or enable some functionality of the selected library.
At load time, if the selected library loads before the application or external libraries, the selected library can load an arbitrary number of other libraries, modify functions already loaded, or perform other operations affecting the application before the application starts execution. For example, if a reference to the selected library heads a list of libraries, the selected library loads before other libraries on the list. In the illustrated embodiment, a reference to a selected library heads the new import table, followed by the original import table. At load time, an entry-point function for the selected library can load other libraries, modify functions such as operating system functions, or perform other operations before the application or external libraries load.
Additional features and advantages of the present invention will be made apparent from the following detailed description of an illustrated embodiment, which proceeds with reference to the accompanying drawings.