Performance of computing loads in a computing system, such as for example a database management system (DBMS), can be improved if an intrinsic capability to perform time-based profiling of central processing unit utilization and wait latency incurred by execution of workloads imposed on the computing system is available. In addition, a determination of how often the “hot” routines (e.g. the most frequently used routines) in a function profile output are invoked can also be advantageous in determining how to best optimize the runtime performance of the DBMS. For example, some functions invoked by a DBMS may result in particularly high central processing unit (CPU) utilization because they are called very frequently, while other functions that are not called so often may result in extended runtimes for other reasons (such as, for example, nested programming loops). In the former case, there may be a need to optimize the context of caller functions (e.g. to reduce unnecessary calls to a callee function, or to re-factor a design to reduce unnecessarily repetitive calls). The latter case may be indicative of a need to investigate one or more hot program lines of a callee function to debug slow running pieces of logic.