1. Field
This invention generally relates to a computing environment. More particularly, the invention relates to sampling technology.
2. General Background
Either time-based or hardware event-based sampling technology is typically utilized in application profiling tools to determine the specific usage of resources. A current approach is to sample by periodically generating interrupts. At each interrupt the current process/thread, the instruction being executed and, optionally, the data address being accessed, may be identified and recorded. At a later time, the collected data is aggregated, and reports are generated showing sample distribution by address, symbol, process, etc. A variety of tools are based on this technology. The full execution context of the sample is not typically recorded and not available in reports.
Attempts have been made to improve this technology by getting call stacks at the time of the sample. The existing set of tools may either attempt to walk the call stack directly or invoke functions on a separate (sampler) thread to obtain the call stack of the interrupted thread. The call stack of the interrupted thread is walked into a tree, and a counter in the leaf node of the call stack collected is incremented. Reports are generated that indicate the number of occurrences or samples of the call stacks and the accumulated samples representing the time spent in the methods or functions called by the reported method or function.
In the case of busy or executing threads, the simple approach of identifying time spent where busy has proven to be valuable and scalable. A determination is made as to where the processor is spending time and the path for getting there. At any instance of time, at most only one thread executes on a given processor. Accordingly, sampling the currently executing threads is limited to at most one per processor per sample. Providing reports for these busy threads via sampling and finding ways to reduce the code paths has helped improve performance.
However, trying to figure out bottlenecks or why workloads are not scaling may involve hundreds or thousands of threads that are not executing. Getting information on all threads or all threads that are not executing may involve a significant amount of time. Simply taking a dump of all threads for an application may provide insight into causes of scalability problems. However, getting a full application thread dump typically is done by stopping the application and taking a snapshot of the application. Getting a full application thread dump is clearly not suitable for analysis in a production environment. Also, a given thread dump may not give insight into a given bottleneck.