In a software application in execution application data is stored in a memory of a computer system in accordance with program instructions written by a software developer. FIG. 1a illustrates an exemplary configuration of a memory of a computer system for the storage of a data structure. The memory 102 is organised as a sequential list of memory locations each including a byte of storage 104 and a memory address 106. The memory address 106 of a memory location is a unique identifier of the memory location. Memory addresses 106 are numbered sequentially for the memory 102 in the computer system. A software application allocates one or more blocks of memory (such as memory blocks 108 and 110) for the storage of application data, where a block of memory comprises one or more contiguous memory locations. The size of a block of memory is determined by the type of data to be stored in the memory 102 and the architecture of the computer system. For example, the storage of an integer number field in memory 102 may occupy more than one memory location (i.e. more than one byte). Application data fields can be allocated and referenced in the memory 102 by low level machine instructions such as machine code.
Using high level programming languages such as “C” or “C++”, the allocation of, and reference to, memory for application data is simplified by using data structures. A data structure is a definition of a unit of application data and can include data fields and other, nested, data structures. By way of example, FIG. 1a includes a representation of a data structure 112 defined in a high level programming language, including a first integer field 114 and a second integer field 116. The data structure 112 includes a name 118 which is used to refer to the entire data structure 112. Also, the first and second data fields 114 and 116 include names 120 and 122 respectively which are used by a software application in a high level language to reference the fields within the data structure. The data structure 112 is used by a software application written in a high level programming language to allocate a block of memory of sufficient size to store data corresponding to all of the fields of the data structure 112, and to refer to the fields within the data structure 112. Such a data structure can be defined in a header file of an application (known as a “.h” file).
FIG. 1a further illustrates how the data structure 112 is stored in the memory 102 during application runtime. In the particular example of FIG. 1a, the integer fields 114 and 116 each occupy four bytes of memory (i.e. four memory locations) when stored in the memory 102. A first block of memory 108 corresponds to the first data field 114, and a second block of memory 110 corresponds to the second data field 116. When stored in the memory 102, the data structure 112 can be referenced using a data structure memory address 124. The data structure memory address 124 is a memory address of a first memory location in the first block of memory 108. The data structure memory address 124 is the address of a first memory location in the memory 102 corresponding to the data structure 112. Each of the individual data fields 114 and 116 of the data structure 112 can be referenced as a block of data at a particular “offset” from the data structure memory address 124, and of a particular size. Thus the first field 114, which is stored in the block of memory 108, is referenced as follows:
ADDRESS: data structure memory address 124+ offset of zero bytes
SIZE: four bytes
Similarly the second field 116, which is stored in the block of memory 110, is referenced as follows:
ADDRESS: data structure memory address 124+ offset of four bytes
SIZE: four bytes
Thus FIG. 1a illustrates how application data is stored in the memory 102 of the computer system, and how the application data can be allocated and referenced by both an application in a high level language and low level machine instructions.
Prior to executing a software application written in a high level programming language, it is necessary to convert the software application to low level machine instructions. The conversion of the software application to low level machine instructions can be performed by a compiler, such as a “C” compiler, or a runtime interpreter, and includes the conversion of high level instructions for the allocation and reference of data structures to low level memory allocations and memory references.
FIG. 1b is a block diagram illustrating the processes involved in compiling a software application which is written in a high level programming language, and which generates a memory dump at runtime. An application executable 130 is compiled from source files 132 and a header file 136 using a compiler 138. The application executable 130 is a binary file containing instructions which are executable on a first computer system 140. At runtime, the application executable 130 is loaded as an application runtime 144 into a memory 142 of the first computer system 140. Application runtime 144 executes using a processor of the first computer system 140 (not shown). Application runtime 144 allocates memory 142 for the storage of application data 146.
Source files 132 contain application source instructions written in a high level programming language such as “C”. Header file 136 supplements source files 132 and includes data structure definitions 148. The data structure definitions 148 are used by the source instructions in the source files 132 to define how application data 146 is allocated and referenced by application runtime 144. An example data structure 150 is illustrated and includes data fields 152, such as numeric or memory pointer fields. Data structure 150 also includes a data structure “eyecatcher” field 151 which is explained below later.
The compiler 138 processes the source files 132 and the header file 136 to generate the application executable 130. During compilation, the compiler 138 converts high level instructions for the allocation of, and reference to, data structure 150 into low level machine instructions for the allocation of, and reference to, memory 142. The low level machine instructions are stored in the application executable 130. The conversion to low level machine instructions involves generating low level memory allocation instructions for allocating blocks of memory 142 at runtime. Such low level allocation instructions provide a memory address for each block of memory allocated in memory 142. Also, the compiler calculates, for each of the fields 152 of data structure 150, an offset from a memory address of a block of allocated memory and a size of the field. Thus, where a reference is made in the source files 132 to one of the fields 152 in the data structure 150, a corresponding reference is generated in the application executable 130 using an offset and size corresponding to the field. In this way, high level source instructions in source files 132 are converted to low level machine instructions in application executable 130 for the allocation of, and reference to, data structure 150.
At runtime, application executable 130 is loaded into memory 142 for execution as application runtime 144. Application data 146 is stored in memory 142 by application runtime 144 using the low level machine instructions generated by compiler 138. The first computer system 140 is configured to generate a dump file 162. Dump file 162 can be generated in response to a request by a user of the first computer system 140, or alternatively in response to a problem in the execution of application runtime 144 such as an unrecoverable error. The dump file 162 is a binary file containing a sequential list of bytes 164, where each byte corresponds to one byte in memory 142 at the particular time when the dump file 162 was generated. The dump file 162 also contains a base memory address 165 representing a base address in the memory 142 from which the sequential list of bytes 164 originates. Thus, using the base memory address 165 and an offset of a particular byte in the dump file 162 from the beginning of the list of bytes 164, it is possible to determine the memory address in memory 142 from which the particular byte originates. Alternatively, the dump file 162 may include a runtime memory address for each byte in the list of bytes 164. Thus, dump file 162 is considered to be a snap-shot of all, or a subset of, the memory 142. Alternatively, the dump file 162 can be a memory dump stored in the memory of the first computer system 140.
Dump file 162 can be used for the analysis of the memory 142 during or after execution of application runtime 144. In particular, it is useful to identify a series of bytes within dump file 162 which correspond to application data 146 in order to determine a particular operating state of application runtime 144 when the dump file 162 was generated. For example, it is useful to identify a series of bytes within dump file 162 which correspond to the fields 152 of data structure 150. However, it is difficult to identify such a series of bytes because there is usually no way to determine which bytes in the dump file 162 correspond to the fields 152 of data structure 150.
One way to aid the identification of bytes in dump file 162 corresponding to the fields 152 of data structure 150 is to use a data structure eyecatcher field 151 within data structure 150. The data structure eyecatcher field 151 has a particular defined value and can be placed as a first field within the data structure 150. Then, when analysing the dump file 162, searching for the particular defined value of the data structure eyecatcher field 151 will identify the first of a series of bytes within dump file 162 corresponding to the fields 152 of data structure 150. The data structure eyecatcher field 151 can be used by a dump file analysis tool 168 executing on a second computer system 170. Alternatively, the dump file analysis tool 168 can execute on the first computer system 140 on which application runtime 144 executes. The dump file analysis tool 168 is able to read the dump file 162 and identify a data structure eyecatcher field 151 corresponding to data structure 150. However, once a series of bytes corresponding to data structure 150 has been identified, it is then necessary for the dump file analysis tool 168 to determine the location of the each of the fields 152 of the data structure 150. This entails the calculation of offsets and sizes for the fields 152 of data structure 150, such as offset 172 and size 174. In order to calculate offset 172 and size 174, the dump file analysis tool 168 must have access to the definition of data structure 150 from header file 136. Thus, with the header file 136 and the provision of a data structure eyecatcher field 151 within data structure 150, dump file analysis tool 168 is able to determine the location of a series of bytes within the dump file 162 corresponding to the fields 152 of data structure 150, and the particular location of each of the fields 152.
There are significant drawbacks to this approach for analysing the dump file 162. Firstly, it is necessary to use the header file 136 to calculate the offset 172 and size 174 of each of the fields 152 of data structure 150. This will not be possible in an environment where the header file 136 is not available, such as when an application is executing in a production computer system where only application executable 130 in binary format is available. Secondly, the offset 172 and size 174 may only be valid for the data structure 150 in a particular version of the application. If the definition of data structure 150 is changed in a second version of the application, an offset 172 and size 174 for each of the fields 152 must be recalculated in accordance with the new definition of data structure 150. Thus, offset 172 and size 174 are version specific. Thirdly, the calculation of offset 172 and size 174 must be undertaken on a computer system which is architecturally compatible with computer system on which the application runtime 144 is executing. This is due to the possibility that the definition of data types can differ between computer systems of different architectures. For example, the PowerPC 32-bit architecture (known as PPC32) (PowerPC is a registered trade mark of International Business Machines Corporation) defines a memory pointer data field as occupying four bytes of memory. In contrast, the PowerPC 64-bit architecture (known as PPC64) defines a memory pointer data field as occupying eight bytes of memory. Thus, if the first computer system 140 has an PPC32 architecture, a memory pointer field in the fields 152 of data structure 150 would correspond to four bytes within application data 146. On subsequent generation of the dump file 162, such a memory pointer field would occupy four bytes of dump file 162. If the second computer system 170 has an PPC64 architecture, the memory pointer field within the dump file 162 would be erroneously considered to correspond to eight bytes by the dump file analysis tool 168 running on the second computer system 170. Thus, the dump file analysis tool 168 calculates an incorrect offset 172 and/or size 174 for a memory pointer field of data structure 150 when the architecture of computer system 170 differs from that of computer system 140. It is therefore necessary to execute the dump file analysis tool 168 on a second computer system 170 with the same architecture as that of the first computer system 140, or alternatively to execute the dump file analysis tool 168 on the first computer system 140.
These drawbacks prevent the development of a generic dump file analysis tool which can execute on any architecture of computer system 170 for any version of an application runtime 144, and without access to the header file 136. It would therefore be advantageous to provide a mechanism for such a generic dump file analysis tool to be developed.