Computing systems are currently in wide use. Some computing systems are server computing systems which host services that can be accessed by client computing systems. By way of example, a client computing system may generate a request and send it to the server computing system. The server computing system may then receive the request and execute a plurality of different workflows in responding to the request. Each workflow may include a plurality of different tasks, that are to be executed in order, to execute the workflow.
Each of these types of requests has an associated latency. The latency corresponds to the time it takes from receiving the request until providing a response to the request. The latency can be dependent on a wide variety of different factors. Each of the different workflows executed in responding to a request may have its own latency, and those individual latencies may vary widely. Similarly, there may be multiple instances of a same workflow that are being executed concurrently.
Therefore, it can be very difficult to identify and reduce the sources of latency in a service computing system. This can be exacerbated due to the large volume of activities and limited access to the computing resources that are available to monitor latency.
In addition, there may be certain events that greatly contribute to the latency, but they may be relatively rare. Therefore, manual attempts to reproduce the workflows and events that may occur in a server computing system, and to collect stack trace information, may be extremely difficult.
To address these issues, some current computing systems will identify, over a predetermined interval, which processes are consuming most of the active work cycles of the computing system. However, it is not uncommon for the execution of processes to be suspended, because they may be waiting on the execution of other processes. These types of suspended time intervals can be very difficult to identify.
For instance, assume that process A begins executing and is then suspended because it is waiting for process B to complete work before it is able to continue. Even though process A is not doing work during the time that it is suspended, current attempts to capture latency will attribute the suspended time to the latency of process A, even though it is not process A that is actually contributing to the latency.
Other attempts have been made to log timestamps at certain points during execution. This can help to surface time spent waiting, but the code needed to log a timestamp must then be inserted manually at the specific points, in every workflow where it is desired, and it must be associated with enough information to be able to tell which particular part of the workflow was being executed at that point. If this technique is expanded to accommodate other parts of the same workflow or additional workflows, this means that the timestamping code must be added at even more spots in the workflows. This can be cumbersome and error prone.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.