It is known practice to develop large and complex computer programs by developing a number of small program modules which, for example, perform a single discrete function and subsequently joining all of these small program modules together to form the complete, single required program. This is advantageous as a number of modules may be developed in parallel and development and testing of the individual, smaller modules is considered to be much easier.
The modules are generally written in source code which is normally a high level language which can be generated by a human and which is in a human readable form. An assembler/compiler reads each source code module and assembles and/or compiles the high level language of the source code module to produce an object code module. The assembler also generates a number of relocations which are used to combine the object code modules at link time in a linker. A linker acts to combine a number of object code modules to form a single executable program.
It is known for the linker to modify parts of the individual program modules during linking in order to optimise the operation and/or performance of the final linked program. This optimisation is not possible before linking as information from other program modules is often required. To enable the linker to perform such optimisation, relocation instructions are included in each program module.
The ELF (executable linking format) standard defines a convention for naming relocations belonging to a given section, e.g. rela.abc is relocation section of section abc. Standard relocations under the ELF format allow an offset in section data to be defined where patching is to occur and a symbol whose value is to be patched. A type field also exists which is used to describe the appropriate method of encoding the value of the symbol into the instruction or data of the section data being patched.
When performing testing and/or debugging operations it is known to use a lister. A lister takes an object code sequence as an input and displays a number of files containing useful information in a humanly readable form. One useful piece of information is the original source code listing. To produce this, the lister implements a conversion process known as disassembling. The source code is useful as machine readable object code is represented simply as hexadecimal numbers and is therefore extremely difficult, if not impossible, for a human operator to read. A further use of the lister is that it is possible to check that the correct variables are being used for a particular program operation. A lister may be used for any object code sequence. This could be, for example, individual object code modules, executable programs (after linking) or library files.
With known disassembly techniques, it is often the case that an instruction in the original source code is expressed in terms of an operand having a value, the value being derived from an expression formed from a number of terms. A simple example would be the instruction “BRA((FOO−$)−x>>1)” where FOO is a label, the value of which is unknown at the time of assembling. The value of FOO would be provided during linking of the program modules. When known listers convert this instruction in its object code form back into source code it is only possible to provide the final value of the expression, so in the example above the output from the list would be “BRA Y”, Y being equal to the value of the expression ((FOO−$)−x>>1). This is inconvenient during testing or debugging as if an error occurs it is not possible to determine if the value of the original variable was incorrect or if an error has occurred elsewhere.
Another problem with existing disassembly techniques is as follows. In order to generate object code sequences from source code modules, an assembler reads source code instructions in the source code sequence, and also acts on so-called assembler directives in the source code module. The assembler directives act to assist or control the conversion of the source code instructions to an object code sequence. With conventional disassembly techniques, when the source code is generated from the object code sequence, these assembler directives are not generated. Thus, it is not possible to assess whether or not an error has occurred in a directive itself rather than in the source code, or whether the disassembled source code is the same as the original source code which itself makes it more difficult to locate any incorrect code.
It is an aim of embodiments of the present invention to provide improved disassembly techniques which mitigate against the problems identified above.