Computer systems, such as clients (e.g., workstations, personal computers, or remote computing terminals) and servers (e.g., mainframes or database servers), may process a wide variety of workloads. Generally, a workload may be described as an amount of processing performed by a computer system for a specific computing task (e.g., creating spreadsheets, sending email, etc.). Different types of workloads correspond to computing tasks being performed for different types of applications. Therefore, different types of workloads are likely to utilize computing resources differently. For example, the amounts of CPU, GPU, memory, and storage used to process a workload for a game application may be very different from the amounts of the same resources used to process a workload for a relational database management system, which may also be very different from the amounts of the same resources used to process a workload for a Webmail service, etc. In this example, the rates at which the resources are used and patterns describing their usage may differ as well for each type of workload.
The utilization of computing resources by a computer system to process a workload may be described with various metrics. Examples of such metrics include percentage of CPU utilized, reads in per second, writes out per second, disk latency, etc. Depending on the type of workload being processed by a computer system, certain metrics describing resource utilization may be more relevant than others. For example, suppose a type of workload being processed by a computer system is characterized by heavy usage of GPU resources (e.g., for a video editing application or for a gaming application). In this example, a metric describing a percentage of available GPU resources that are being utilized by the computer system will be especially relevant to a user of the computer system since this metric will provide the user with information to determine whether the amount of unused GPU resources available on the computer system are sufficient to meet the user's needs (e.g., to continue to process the workload, to process additional GPU-intensive workloads, etc.).
Without these metrics, if the computing resources of a computer system are utilized to such an extent that they are nearly exhausted (e.g., when utilization of GPU resources approaches 100% of available GPU resources), a user of the computer system may experience degradation in performance without warning. For example, the user may notice that their workload is being processed by the computer system at a slower than usual rate or the application for which the workload is being processed may even crash. Thus, to prevent and diagnose performance issues, users of computer systems will often find it helpful to monitor certain metrics on their computer systems based on the types of workloads being processed.
Many operating systems running on computer systems include tools that may help users monitor such metrics and manage their workloads. For example, macOS Activity Monitor offers users a table that displays processes currently running on the macOS operating system, the amounts of available computing resources being utilized by each process, and information indicating which of the processes are using the majority of available computing resources. From this table, a user may request to quit an application that is using too many computing resources. Windows Task Manager also offers users a similar type of table for processes running on the Windows operating system.
However, since these types of tools present users with a list of several processes running on an operating system, some users may have difficulty keeping track of the usage of resources by any particular process. Additionally, keeping track of these metrics is complicated in situations in which the workloads being processed by computer systems frequently change. For example, a system administrator may have a difficult time keeping track of resource usage by multiple emulated computer systems (e.g., virtual machines) deployed on multiple nodes or multiple clusters of nodes in a networked virtualization system, in which users of the emulated computer systems frequently come and go.
Furthermore, some metrics may not be available using the tools provided by operating systems. For example, metrics provided by such tools may not indicate a distribution of input/output performed over multiple disks or disk latency, which are especially relevant to workloads processed by relational database management systems. As an additional example these tools may not provide metrics indicating GPU usage, which is especially relevant to workloads processed by gaming applications. In order to obtain these types of metrics, users may be required to install additional tools on their computer systems that are specific to each of the types of workloads being processed. However, requiring users to install and become familiarized with such tools may only add to the complexity of keeping track of workload-related metrics.
Therefore, there is a need for an improved approach for presenting metrics that are relevant to a workload being processed by a computer system.