The present invention relates to system maintenance and diagnosis, and more particularly to techniques for non-intrusive gathering of diagnostic data in a monitored system.
When a monitored system encounters a failure or error, diagnostic data is typically collected and stored to a disk for diagnostic analysis (also referred to as dumping diagnostic data to a disk). The diagnostic data may then be communicated to a diagnosis site for analysis and resolution of the error. The diagnostic data that is collected and stored (also referred to as diagnostic data dumps) can be quite large measuring in thousands of files and several gigabytes of data. As a result, the process of gathering and storing diagnostic data can be time consuming. This can have several negative impacts on the system. For example, when a process or thread encounters an error in the system, the failing process or thread that receives the error may be in a suspended state until the diagnostic data gathering has completed. As a result, an end user sending a request to the failing process or thread may receive no response from the failing process or thread for the request since the failing process or thread is in a suspended state. In addition, if the failing process or thread holds a lock on some system resources that are shared with other processes or sessions, these other processes or sessions may be blocked for the duration of diagnostic data gathering caused by the failing process or thread. Furthermore, due to the size of the diagnostic data being gathered and stored, the task of gathering and storing the diagnostic data can be very resource intensive and may negatively impact the system as a whole. The situation is further aggravated if the failures are encountered by a large number of processes or threads in the system causing the failing processes or threads to collectively exhaust system resources.