Exemplary embodiments relate to discovery of code segments, and particularly to using special instrumentation for dynamic discovery of code segments during program execution.
One of the approaches, used for binary code instrumentation, is called static instrumentation in which executable modules are examined and modified before the executable modules are executed. The major problem for static instrumentation is to correctly identify all code and data segments.
A code segment may be defined as a set of contiguous instructions, which start with an entry point. Between two addresses, there may be several code segments that are confirmed as code or data. Typically, code segments extend from an entry point to a first unconditional branch instruction. An entry point may be a memory address corresponding to a point in the code of a computer program which is intended as the destination of a long jump. A memory address is an identifier for a memory location at which a computer program or a hardware device can store a piece of data.
Static binary code instrumentation relies on various techniques for finding all potential entry points and separating code from data. The code contains instructions while the data may be constants within the instructions. Some of these methods include using debugging information and import/export tables; branch following; scanning relocation tables; and liveness analysis.
During static instrumentation, potential entry points are analyzed and classified as confirmed code or confirmed data. However, for certain situations, these methodologies do not provide reliable results, such as, e.g., for a code segment inside a binary module which may look like a string or other data, and the code segment may not have complete debug information (as seen in Example 1). As a result, a potential entry point is not confirmed as either code or data, so the potential code segment is not instrumented as code, which causes runtime crashes.
Example 1 is below:
#include <stdio.h>int count;// Define a class FOO with a constructor;class _declspec( dllexport ) FOO{public;  // Define a constructor  FOO( unsigned i ) : m_value(i)  {    // Print a message if the constructor is called    printf(“FOO #%u: %u\n”, count, i );    count += 1;  }public;  unsigned m_value;};// Now define a global variable of type FOO.// This should cause the constructor for the global variable// to be called when the DLL is loaded.// FOO fool = 0X706C65;// When optimizations are on and the frame pointer is omitted, the// the first few instructions in the initialize are:// 1000C3E0 68 65 6C 70 00  push706C65h// 1000C3E5 B9 04 0C 01 10  movecx,offset fool (10010C04h)// 1000C3EA E8 11 4C FF FF  callFOO::FOO (10001000h)// 1000C3EF C3   ret// Memory contents at the same address:// 0x1000C3E0 68 65 6c 70 help// 0x1000C3E4 00 b9 04 0c . ..// 0x1000C3E8 01 10 e8 11 .. .// 0x1000C3EC 4c ff ff c3 L  // 0x1000C3F0 00 00 00 00 ....// First instruction looks very much as NULL - terminated string// In the absence of reliable debug info, it is instrumented as data.