The simplest method for knowing an amount of usage of a resource for each thread is to use a system call of an OS kernel. For example, an amount of usage of a central processing unit (CPU) can be known by means of getrusage( ) in Linux and GetThreadTimes( ) in Windows (registered trademark). However, the system call requires a transition to a privileged mode, and accordingly costs much. Therefore, the system call is not suitable for such a way of use as storing time stamps at entrances and exits of all methods. Moreover, besides such a problem of the cost, there is also a risk that excessive processing such as a rescheduling is performed in the OS kernel owing to the system call, leading to a disturbance of an inherent behavior of an application.
In many cases, a recent CPU includes a “resource usage counter” readable from a user level. For example, in a CPU such as Pentium (registered trademark), a 64-bit internal counter termed as “time stamp counter (TSC)” holding the number of clock counts from the time of reset is prepared, and is readable by means of the RDTSC instruction. However, this counter is accompanied with the CPU, and is counted up no matter which thread is being executed. Accordingly, from the counter, the amount of usage of resource for each thread cannot be known.
By contrast, a package for the Linux, which is termed as “Estime,” has provided a solution to this problem (as disclosed by Estime: a High-Resolution Virtual Timer for Linux. According to this method, when an OS kernel is going to dispatch a thread, an amount of the CPU capacity, which has been used by the thread by then, and a value of the TSC at the time the dispatch is performed are written into a memory area which has been mapped in a user space. The thread can calculate an amount of the CPU capacity, which has been used at an arbitrary time, from these values and the most recent value of the TSC, without entering into the OS kernel. This method is innovative in that a low-cost measurement of an amount of resource used can be performed “for each of the threads.” However, no attention has been paid to a measurement “for each of the CPUs.”
When a large-scale program such as a Web application is investigated by a sampling measurement, in many cases, there is no specific method executed extremely frequently, and a flat profile where the respective methods are evenly executed is obtained. Therefore, it is difficult to use a conventional technique of tuning a “hot method”.
In order to grasp and improve a behavior of such an application, it is necessary to make investigations considering an execution path of the method, and to find a critical path which consumes the resource. However, when it costs much to measure the amount of usage of the resource, there occur such problems that an overhead of the measurement itself becomes dominant, and that the behavior of the application is changed. Moreover, the measurement considering the execution path does not make sense if the measurement cannot be performed for each thread. If such information on the amount of usage of the resource can be measured for each CPU, it is made possible to know a status where it takes time to execute a code or a cache miss frequently occurring in a specific CPU.