Processes executing on a compute device may exhibit several different kinds of faults or undesirable behavior, such as unexpectedly ending, entering infinite loops, or being a “noisy neighbor,” impacting the performance of other processes being executed on the compute device. A compute device may monitor a process to detect or address such problems in various ways. The compute device can perform process status monitoring to determine whether a certain process ID for the monitored process is valid and check a process state of the monitored process to determine whether the process state is running normally. However, the process status monitoring only provides limited failure detection capability and does not detect faults such as an infinite loop.
In order to enable monitoring for the presence of certain faults, an application being executed as a process may use instrumentation, which allows the compute device to monitor the process, such as by using heartbeat schemes. However, such monitoring methods may be complex with no standard mechanism, may be tailored solutions that change over generations of products and require heavy maintenance burden, may require that every critical process be instrumented, and may fail to detect performance impacts. Another monitoring technique that a compute device may employ is a system watchdog that monitors a process. However, the system watchdog only captures catastrophic failures and kernel lockups and results in a system reset.