In a typical cloud-based computing environment (e.g., a data center), multiple compute nodes may execute workloads (e.g., processes, applications, services, etc.) on behalf of customers. During the execution of the workloads, the amounts and types of resources (e.g., memory, data storage, processor capacity, and/or specialized processors such as graphics processing units, etc.) utilized by the workloads varies over time, as the workloads pass through different phases of operation and as some workloads are completed and new workloads are assigned to the compute nodes. To guard against the possibility of having inadequate resources for the workloads, which would decrease the performance of the workloads, each compute node is typically equipped with enough of each resource to meet the peak amount that may occasionally be requested by the workloads. As such, given the variations in the resource utilization needs of the workloads as they are executed, the capacity of the local resources on each compute node may go unused for a significant percentage of the time, resulting in wasted resources in the data center.