Linkers for producing executable programs are known. Generally speaking, a linker acts to link a number of object code modules to form a single executable program. Object code modules are usually generated from program source code modules, these modules being written in a high level language. An assembler/compiler reads each source code module and assembles and/or compiles the high level language of the source code module to produce an object code module. The assembler also generates a number of relocations which are used to combine the object code modules at link time in a linker.
The ELF (executable linking format) standard defines a convention for naming relocation sections belonging to a given section, e.g. rela.abc is relocation section of section .abc. Standard relocations under the ELF format allow an offset in section data to be defined where patching is to occur and a symbol whose value is to be patched. A type field also exists which is used to describe the appropriate method of encoding the value of the symbol into the instruction or data of the section data being patched. According to the existing arrangements, the relocation type definitions are usually created on an ad hoc basis for each instruction set targeted. The 32 bit ELF standard allows only 256 distinct relocation types, so the same types are reascribed to different semantics for each instruction set.
The existing linkers each have to be set up to understand the particular semantics used for the type definitions in the relocations for each instruction set. Moreover, the relocation operations which can be defined by the relocations are limited in the existing ELF standard.
A more flexible relocation format has been developed by the Applicants which, in particular, provides independence from target architecture and allows for user written optimisations of code to be accomplished at link time.
Also in known systems it is currently the case that all functions present in the object code modules are taken through the linking process even when they are not in fact used in the final linked program being generated. An example of this would be standard functions in the computer language in which the program source modules are being written. Unnecessary inclusion of un-used functions results in inefficiencies in the linked program being generated due to its excessive size.
Eliminating the code of such functions is a well-known linker relaxation (optimisation) technique called USE (uncalled subroutine elimination). One known method of performing this is for the linker to know which instruction sequences are used by the assembler/compiler to cause a branch to a function. A program is stored as code sequences most of which have been generated by the compiler to correspond with functions in a high level language (such as the C language). The program will be loaded into memory such that each sequence/function runs from a start address to an end address. One of these start addresses will be the “entry point” into the program, that is the first instruction executed by the microprocessor when the program starts. A branch instruction gives a target address within a stored program and a condition under which the branch is taken. Starting from the entry point(s) for the program being generated, the linker can follow all the possible paths (assuming both possible conditions at every branch) through these branches and thereby identify uncalled functions. This method has the disadvantage that the linker needs to be re-coded for new architectures.
Another known method makes use of the fact that the linker repeatedly reads the relocations of the object code module, each read being known as a pass. The linker pass number is stored with a symbol whenever the value of a symbol is referenced by a relocation. The linker also ensures that the entry point symbol always has the current pass number stored with it, even though usually no relocation references its value. A link time conditional instruction is then used to evaluate whether the symbol's value has been used in the current pass. Based on this condition a function labelled by the symbol can be eliminated if it is no longer referenced. This is done by using a conditional relocation to test the condition. With each pass more functions may be eliminated until the program becomes stable. For example, if function “A”, which is not at the entry point, calls function “B” which in turn calls function “C”, then on the second pass function A may be skipped. This results in its reference to function B being missed, but B and C are nevertheless traversed on successive passes until B and then C become un-referenced. This method has the disadvantage that if A calls B and B calls A but no other functions call either of them they will not be eliminated because A is never skipped on the first pass so its reference to function B is detected and B will be traversed which will update the pass number of A and so on for all successive passes.
It would be advantageous to provide a linker optimisation which includes the discarding of un-used functions and which mitigates the problems of the prior art.
A related problem is the use of library functions which are standard functions in the programming language in which the program source modules are being written and which are often present in object code modules. These functions can have a “full” version and a “reduced” version, which may be required in dependence on the data values being acted on by the function. The assembler/compiler can see if any given use of the function is a suitable candidate for using the reduced version, but cannot see all the program code modules taken as a whole, since it is not able to view them all together as such. This means that it can not see if there is any module which requires the full version. This is important because if there is, the full version should be used even when compiling those modules which do not require its extra capabilities, because otherwise the final executable program to be generated will contain both versions. It is inefficient use of space for the executable to contain both versions and it is not necessary because a module which only requires the reduced version of the function can equally as well use the full version. (The reverse is not true so if the full version is required anywhere this must be present in the executable program rather than the reduced version).
It would be advantageous to allow the linker to survey every use of each version of such a function so that it can make transformations of the uses of the reduced version, to use the full version.