Many modern computer systems, including their resident operating systems and application programs have software bugs, or may have or develop other problems that cause maloperation. When maloperation occurs, it is desirable to identify any causative bugs or other problems so that repairs may be made to prevent future maloperation. Symbolic debuggers may be used by trained service personnel to examine variables of a maloperating computer system to help those personnel identify any causative bug or other problem.
Software bugs may exist in the operating system itself or in application programs. Viruses may enter the system and cause various levels of damage, leading to maloperation. Other problems that can cause maloperation include hardware defects, malfunctions of attached devices and networked computer systems, and maliciously or accidentally incorrect user input.
Debugging is often an iterative process of information gathering, making changes or performing other experiments, and testing. A symbolic debugger is a tool highly useful for at least the information gathering phase of this process, and occasionally useful for performing experiments and testing.
Symbolic kernel debuggers are symbolic debuggers having command sets and functions particularly suited to solving software bugs and problems associated with an operating system kernel or software drivers, such as input/output device drivers that run in kernel mode.
Symbolic kernel debuggers are often used to diagnose intermittent bugs in systems. Debugging intermittent problems requires that the failure state of the intermittent bug be reproduced, then the debugger may be used to examine information relevant to the bug. Reproducing intermittent bugs can be difficult and time consuming, sometimes requiring extensive repetition of test sequences to induce the failure state.
Symbolic Debuggers and Symbol Resolution
Symbolic debuggers, including symbolic kernel debuggers, typically incorporate a collection driver, a user interface, and a symbol resolution system.
The symbol resolution system reads a module-specific symbol file, typically generated by an assembler, compiler or linker when the module was created, to obtain a list of known symbols and relative or absolute addresses corresponding to those symbols. The symbol resolution system uses this list to translate symbolic requests by service personnel into memory addresses having variables to be read or function entry points to be called or intercepted. The symbol file may also specify a variable type associated with each symbol, which may control the way the user interface displays memory data.
Symbolic kernel debuggers may have the ability to parse system information, such as process lists and input/output buffers and to display relevant information. The symbol resolution system is often used by these debuggers to locate those lists and buffers since those lists and buffers may appear at different locations in memory for each version of the kernel or driver software.
It is essential that the version of the symbol file used by the symbol resolution system correspond to the version of the module running on the machine being debugged. If this is not the case, the debugger may read different locations than those intended, which can result in user confusion or a crashed debugger. Since operating modules are updated frequently, with service packs and in-line updates as well as with new operating system releases, locating, storing, and ensuring use of the correct symbol file can be a difficult exercise. Kernel-mode driver modules may also be released, patched, and updated, by hardware vendors; further complicating the logistical problem of ensuring use of the correct symbol file.
The collection driver typically gathers data as requested by service personnel, and, may, but need not, also have the ability to alter selected memory locations. The collection driver runs on the target machine, the machine being diagnosed. Some symbolic kernel debuggers are known to utilize a serial port of the target machine to communicate with a diagnosis machine upon which runs the user interface and symbol resolution system.
The user interface typically interfaces the collection driver and the symbol resolution system to a keyboard and display for interaction with service personnel users. The user interface may include system-specific code for reading linked-lists, including process lists, and displaying information from those lists to the user. Extraction of data from linked lists typically requires multiple calls from the user interface to the collection driver.
Coherency of Data
When an operating system runs on a machine, it is known that many operating system variables and data structures change as the system runs. Many of these data structures are of length greater than the word length of the machine; changes made to these structures must take place over several processor operations. If these data structures are examined by a debugger after the first operation of a change, but before the last operation of the change, the data captured or viewed by the debugger may not accurately reflect the state of the system. Similarly, if a debugger begins to view or capture a data structure prior to a change, but completes capture after the change, the data captured or viewed will be incoherent in that it does not accurately reflect the state of the system.
Incoherent data may cause confusion to service personnel attempting to interpret it. Since no indication of incoherency exists, it can be difficult to determine whether a problem indicated by the data is because the data is incoherent, or because the data indicates a problem with the system. Incoherent data may cause a debugger to display erroneous information. If incoherent data is followed as part of a linked list, the debugger may crash or attempt an illegal operation. For example, if links of a doubly linked list are examined after update to the forward links, but before update to the reverse links, the reverse links are incoherent and could result in a debugger crash if the debugger follows them. Debugger crashes may not only require that the debugger be restarted, but may require extensive work to reproduce the failure state of an intermittent bug.
It is desirable that captured data accurately reflect system state, or be “coherent,” so that service personnel may diagnose the system without confusion, wasted effort following false leads, and without losing time to restarting crashed debuggers and reproducing bugs.
Some existing symbolic kernel debuggers ignore the problem of incoherent data. A debugger believed to ignore incoherency when taking a system snapshot is Microsoft i386 KD running under LiveKD by SysInternals, as distributed with the book: Inside Windows 2000 3'd ed. by David Solomon and Mark Russiovich, Microsoft Press, 2000. Other debuggers such as Microsoft's i386 KD without LiveKD enforce partial coherency by stopping execution of all programs, except for the debugger, on the target machine until debugging is complete. Stopping execution renders the target machine temporarily unusable, disrupting any real-time control functions or network services provided by that machine. Stopping execution prevents the operating system from making changes to data structures while the debugger is capturing or displaying those structures thereby preventing apparent incoherency resulting from updates to these structures as the debugger is reading them. Unless execution is stopped at a time when no changes to data structures are in progress, some incoherency may, however, exist.
It can be useful to obtain a snapshot of coherent state information about a system, allow that system to continue execution for some time, and obtain a second snapshot from that system. This permits service personnel to observe how the system state changes with time, which can yield useful clues about system bugs and other problems. In particular, multiple snapshots can be useful in identifying memory and resource leaks and performance problems.
Linked Process Lists
Linked lists may be used by an operating system to store information about processes. It is known that Windows NT 4.0 and Windows 2000 store process information in a linked process list. Each node of this list may incorporate a further linked thread list as well as additional information about the process that may be useful in debugging a system. For example, in addition to pointers to a thread list, a process list node may include process name, execution priority and execution privileges.
Each node of the thread list contains list pointers to security tokens, context switches, I/O request lists, and wait blocks that can be of interest to service personnel investigating a software bug or other problem. Each node of the thread list of a process node may be linked to additional linked lists of the system.
Prior Debuggers
Microsoft Kernel Debugger i386 KD (KD) is a symbolic debugger tailored for Windows NT and Windows 2000 kernel and kernel-mode driver debugging. KD is designed for operation through a serial port of a target machine. Two machines are required, the target machine on which the system being debugged is located, having a collection driver, and an analysis machine having a symbol resolution system and a user interface. Symbol files matching the system being debugged must be present on the analysis machine. Matching symbol files are not automatically located although they are verified as correctly matching the target system. When KD is in use, all other threads on the target machine are stopped until debugging is complete, severely impacting operation of that machine. KD can, however, alter system variables and allow the system to resume operation when debugging is complete.
Statement of the Problem
Collection drivers for symbolic kernel debuggers must run with high privileges in kernel mode. Code run with those privileges poses security and bug risks, so it is desirable that it be small, with few versions.
Since the locations of kernel variables, including process lists, can vary from release to release it is desirable for a debugger to derive this information from symbol files at run time rather than embedding this information in a version-specific collection driver. It is also desirable to place other system-specific information in a command plug-in of the user interface instead of the collection driver.
It is therefore desirable to have a way of specifying system and version specific information, including list format and structure information, to be collected by a collection driver. The collection driver then interprets this specification.