When application software that is running on an embedded device crashes in a production environment, it can be very difficult to determine why the crash occurred. The crash may be hard or impossible to replicate in a debug environment. The typical strategy is to save as much information as possible on the current state of the system for later analysis.
Most debugging toolsets include features that can analyze this saved data. This typically includes building a call stack of the calls leading up to the crash. However, generating this listing depends on having accurate stack memory and register contents at the precise time that the crash occurred.
A stack-based processor architecture uses a section of memory called the stack to store temporary data. A register called the stack pointer, or SP, will point to the current location of that memory. Temporary data will therefore be stored at an offset of the SP. When another function is called, the SP is adjusted to the top of the stack (or bottom if the stack grows down) so that the called function doesn't overwrite data that the current function still intends to use. The section of the stack used by one specific function is referred to as the frame. The size of a frame can grow and shrink during the execution of a function depending on its storage requirements at each point in time.
When a function is called, the location to return to, the return value, and other parameters are passed to it. This is done either by placing that data in specific registers, or by placing it onto the stack. Since registers that the called function intends to use will be copied to the stack before they are used so that their values can be restored before returning, these values will often be located on the stack at some point.
The compiler knows how a function intends to use the stack allowing it to generate DWARF (Debugging With Attributed Record Formats) debug information that describes for each point in a function, where the return PC is located, what registers have been pushed onto the stack, and how the SP has been adjusted since the start of the function.
The current method of generating a call stack listing is to first read the program counter (PC) register. The function that corresponds to where the return PC is located and what the return SP is can then all be looked up in the DWARF information using the PC value as a key. The value of the return PC indicates the calling function. The process can then be repeated for that function. This ends up showing a list of all calling functions leading up to the current function.
Generating this listing depends on having accurate stack memory and registers available at the precise time that the crash occurred. If register information is incomplete or inaccurate, the debugger will not know where on the stack to start looking, or what function's frame description it should look up to find the previous frame.
Crashes are often caused by a branch taken on an invalid pointer, either due to a logic error or corrupted memory due to a heap overflow, an uninitialized variable, improperly configured (DMA) Direct Memory Access, or other problems. In the absence of Operating System (OS) or other system protections this type of error can lead to invalid opcodes being executed by the processor and corruption of the registers and perhaps the stack, which can make it impossible for debugging toolsets to rebuild the call stack using standard approaches. This, in turn, makes it very difficult to determine the chain of events that led up to the crash.