The scale and performance of large distributed computing systems may be limited by power and thermal constraints. For example, there may be constraints at the site level as well as at the component level. At the component level, components may tend to throttle their performance to reduce temperature and power and avoid damage when workloads cause them to exceed safe thermal and power density operational limits. At the site level, future systems may run under a power boundary to ensure that the site stays within site power limits, wherein the site power limits are derived from constraints on operational costs or limitations of the cooling and power delivery infrastructure.
Concomitantly, manufacturing process variation may result in higher variance in the voltage that is supplied to a component for its circuits to function correctly at a given performance level. Unfortunately, the thermal and power limitations in large distributed computing systems may expose these differences in supply voltage requirements, leading to unexpected performance differences across like components. For example, different processors in the system may throttle to different frequencies because they exhaust thermal and power density headroom at different points. This difference may occur even if the processors are selected from the same bin and/or product SKU because parts from the same bin may still exhibit non-negligible variation in voltage requirements. As another example, a uniform partition of power among like components may result in different performance across components when limiting system power to stay within site limits.