In a typical cloud-based computing environment (e.g., a data center), multiple compute nodes may execute workloads (e.g., applications, services, etc.) on behalf of customers. Each workload consumes resources, such as compute resources (e.g., processor cycles, processor cores, etc.), accelerator resources (e.g., field programmable gate arrays (FPGA), application specific integrated circuits (ASIC), graphics processing units (GPU), or other specialized hardware to accelerate processing), memory resources, and/or data storage resources, as the workloads are executed. Typically, each customer expects a certain quality of service (QoS), indicative of a maximum latency or a priority (e.g., a precedence over other workloads) for workloads executed on behalf of the customer. A human administrator may attempt to satisfy a relatively high QoS by assigning a relatively small amount of workloads to compute nodes that include extensive resources. As a result, those compute nodes may execute the workloads at the desired QoS, while only using a small percentage of their available resources. Accordingly, the unused capacity of those compute nodes is wasted when it could otherwise have been used to execute other workloads. Conversely, a human administrator may attempt to reduce the amount of idle resources in the data center by aggressively assigning workloads to compute nodes, which may result in the compute nodes becoming overloaded and failing to satisfy a desired QoS. As such, in typical cloud-based computing environments, the compute nodes are dependent upon an external administrator to enforce any QoS goals, which can add inefficiency to the operation of the computing environment.