1. Field of the Invention
The present invention relates to the modification of executable computer programs.
2. State of the Art
Many instances arise in which it is desirable to modify the behavior of an executable computer program. Such modifications may of course be made using the program source code, by writing additional source code and performing another program build to create a derivative executable computer program. Often, however, it is desirable to modify the behavior of an executable computer program using only the executable code, or object code, also referred to as the executable image. One instance in which such modification is performed is in program monitoring, or profiling, as described in U.S. Pat. Nos. 5,193,180 and 5,335,344, incorporated herein by reference. In the modification method described in the foregoing patents, however, new instructions and new data are interspersed with old code and old data in accordance with a detailed control program. The control program requires detailed, instruction-level knowledge of the executable program. Specification of what instructions and what data are to be add where is a painstaking process that is difficult to automate.
Object code modification unavoidably requires a detailed knowledge of the file format of the executable file to be modified. One prevalent file format is the Common Object File Format (COFF), common to both the Unix and PC worlds. A newer format, the roots of which may be traced back to COFF, is the Windows NT.TM. Portable Executable (PE) format. Current Windows programs are typically of this format. The present invention will therefore be described, in an exemplary embodiment, with reference to the PE format. To enable an understanding of the present invention, the PE format will be described in considerable detail. (The PE format is publicly documented, for example in the Microsoft Developers Network (MSDN) CD-ROM, as well as on the MSDN Web site, at http://premium.microsoft.com/msdn/library/techart/pefile.htm and elsewhere.) The principles of the invention, however, are applicable to various file formats commonly used on various hardware platforms (Windows, Unix, Macintosh, etc.).
Referring to FIG. 1, the PE format calls for an executable to have a code section, a data section, and a resource section. All code segments are combined into a single section. The data section may contain different types of data, including, for example, an import data (.idata) and export data (.edata). (The location of various types of information within the executable is set forth in the data directory of the PE optional header, described below.) The executable may also have other optional sections, for example a relocation section. Although the foregoing arrangement is typical,
A header portion of a PE executable includes a PE file header, a PE file "optional" header (required), and a section table header. Finally, the PE executable includes an MS-DOS stub program. (This stub runs under DOS and typically just informs the user that the main program is not a DOS program and cannot be run under DOS.)
Referring to FIG. 2, the structure of the PE file header is shown. The PE file header is of fixed size and contains high-level information used by the system or application to determine how to treat the file. The NumberOfSections field indicates how many section headers and section bodies the executable contains and may be used to extract information from the executable. The section headers are laid out sequentially in the section header table, and the corresponding section bodies are laid out sequentially following the section header table.
Referring to FIG. 3, the optional PE header contains most of the meaningful information about the executable image. The standard fields have the same names as corresponding fields in COFF. The AddressOfEntryPoint field indicates the location of the entry point for the application within the code section. Immediately preceding the module entry point within the code section is an Import Address Table (IAT), a series of jump instructions and associated virtual jump-to addresses that, during loading of the executable by the operating system, are "fixed-up" to contain physical addresses of imported functions that may be called by the module.
The additional (non-COFF) fields provide loader support for the operating system. The linageBase field specifies the preferred base address in the address space of a process to map the executable image to. (In the case of a Windows C++ compiler, the default value for executables is 0.times.00400000; DLLs must use a different address.) The FileAlignment field dictates the minimum size of section bodies within the image file prior to loading, whereas the SectionAlignment field dictates the minimum amount of space a section can occupy when loaded. The SizeOfImage field is obtained by determining how many bytes each section requires, rounding to the nearest page boundary, rounding the page count to the nearest Section Alignment boundary, and forming the sum total of each sections's individual requirement. The SizeOfHeaders field indicates the total size of the header portion of the file, or where the section bodies begin in the file.
Located at the end of the optional header structure is an array of data directory location entries, indicating where to find other important components of executable information in the file. Including in the directory location entries array are entries for an export directory, an import directory, a resource directory, a base relocation directory, etc., corresponding to predefined sections of the executable. The field NumberOfRvaAndSizes identifies the length of the data directory array. Each data directory location entry specifies the size and relative virtual address of a directory located within a corresponding section. Typically, a data directory is the first structure within the section body.
Referring to FIG. 4, section headers are of fixed length. The VirtualAddress field identifies the virtual address in the process's address space to which to load the section. The actual address is created by taking the value of this field and adding it to the ImageBase virtual address in the optional header structure. (However, if the image file is a Dynamic Link Library component, or DLL, it may be loaded to a location different than the requested location, necessitating relocation.) The SizeOfRawData field indicates the size of the section body to the next nearest FileAlignment-sized increment. Once the image is loaded into a process's address space, the size of the section body becomes less than or equal to a multiple of SectionAlignment. The characteristics, or attributes, field defines the section characteristics as shown in FIG. 5.
Of the predefined sections, the most complex is the resource section. Resources may include such things as cursors, bitmaps, icons, menus, dialogs, strings, fonts, etc. Referring to FIG. 6, a simple resource tree structure is shown. At the root of the tree is type directory having one entry for each type of resource the file contains (regardless of how many resources of each type it contains). In the example of FIG. 6, one type entry might be for menus and the other type entry for string tables. Each of the entries in the root-level type directory points to a sibling node in the second level of the tree. These nodes arc directories also, used to identify the name of each resource within a given type. For an application having multiple menus defined, for example, there would be an entry for each one at the second level of the tree. Resources can be identified by name or by integer. If by name, the Name field is used to point to a name structure containing the name in Unicode, for example. Otherwise, the Name field represents the integer ID of the resource.
Level three of the tree structure maps a one-to-one correspondence between the individually identified resources and their respective language IDs. For example, the value 0.times.09 designates English as the primary language. Each level three node points to a leaf node containing an image resource data entry structure of a type shown in FIG. 7.
Of the various data sections, most relevant to the present invention arc the export data section (.edata) and the import data section (.idata). Functions may be "exported" from a module by "publishing" a list of exported function entry points. The export data section includes a image export directory structure of a type shown in FIG. 8. The AddressOfFunctios field is an offset to a list of exported function entry points. The AddressOfNames field is the address of an offset to the beginning of a null-separated list of exported function names. The AddressOfNameOrdinals is an offset to a list of ordinal values for the same exported functions. The three Address fields are relative virtual addresses into the address space of a process once the module has been loaded. Before the file is loaded, the address can be determined by subtracting the section header virtual address (VirtualAddress) from the given field address, adding the section body offset (PointerToRawData) to the result, and then using this value as an offset into the image file.
Similarly, a module may "import" a function from another module. The module and function names of all imported modules are listed in the idata section data. The function names and module names to which they belong are ordered such that a function name appears first, followed by the module name and then by the rest of the function names, if any, as shown in FIG. 9.