Troubleshooting a real-time system running in a production environment has always been a challenge due to varied configurations and traffic properties that are difficult to replicate in development labs as well as due to limited debugging tools available for use in production environments.
Prior approaches to debug and resolve issues in a production environment involve running debug images in a production environment. However, this approach is not desirable because of the time it takes to set up and run a debug image. This approach cannot be performed in real time. Another approach is to replicate a similar setup in a development lab, where engineers attempt to replicate the problem and use enhanced debug tools. Again, this approach suffers the drawback of delay, and often the problem is difficult to replicate. Yet another approach has been the exchange of logs, traces and memory dumps among customer support engineers and development engineers, which is perhaps the most time-consuming way to solve problems experienced in production environments.
A source-level debugger is often used while troubleshooting in development labs. Many real-time operating systems include a debug agent that, in conjunction with a debugger running on a host machine, facilitates source-level debugging. An example of such a system is V×Works that runs the Wind DeBug (WDB) Agent to talk to a GNU debugger (GDB) application running on a Sun workstation. However, this approach is service impacting and is difficult to use in a production environment as it is intrusive and requires the CPU of the machine being debugged to be halted. Also, source-level debuggers such as V×Works need the host machine to be connected to the system to be debugged, which may pose difficulty for remotely debugging an active system.
High-Availability real-time systems are characterized by minimal downtime achieved by built-in redundancy in the system architecture. The above limitations of traditional debugging methods become more significant in high-availability environment because of the intrusive nature of these methods.