Field of the Disclosure
The present disclosure relates generally to processing devices and, more particularly, scheduling processes in processing devices.
Description of the Related Art
A processing device, such as a system-on-a-chip (SOC), often incorporates multiple compute units of a processing device onto a single substrate. A compute unit typically includes one or more processor cores that share resources such as floating-point unit, one or more caches, branch predictors, a physical layer interface to external memory, and other front-end logic. For example, an accelerated processing unit (APU) may use a single substrate to support and interconnect multiple compute units such as central processing units (CPUs) or graphics processing units (GPUs). Some processing devices may also stack multiple substrates on top of each other and interconnect them using through silicon vias (TSVs). For example, one or more substrates including memory elements such as dynamic random access memory (DRAM) may be stacked over a substrate including APU, which can read instructions or data from the dynamic random access memory (DRAM) via the physical layer interface, perform operations using the instructions or data, and then write the results back into the DRAM via the physical layer interface.
Operation of the components of the SOC generates heat, which raises the temperature of the SOC. The temperature at a particular location on the SOC depends on the thermal density at the location and the thermal sensitivity of the location. The thermal density indicates the amount of power dissipated per unit area or the amount of heat dissipation per unit area at a location on the SOC. The thermal sensitivity indicates how sensitive the temperature at a particular location is to changes in the thermal density in a region proximate the location. For example, a region with a higher thermal sensitivity may rise to a higher temperature than a region with a lower thermal sensitivity when the two regions are exposed to the same thermal density. The thermal sensitivity is typically larger in portions of the SOC that include a larger density of circuits because changes in the power dissipated in higher density circuits can lead to more rapid changes in the local temperature. The thermal sensitivity is also typically larger at the center of a substrate because circuits in the center of the substrate are not as close to external heat sinks and therefore do not dissipate heat as efficiently as circuits near the edge of the substrate that are closer to the external heat sinks. Stacking multiple substrates in a 3-dimensional configuration may also affect the thermal density and thermal sensitivity because heat can be efficiently conducted between the stacked substrates.
Conventional power management algorithms attempt to maintain the operating temperature of the SOC within a predetermined range using temperatures measured by one or more temperature sensors at different locations around the substrate. The power management algorithms can adjust the operating frequency or operating voltage of the SOC so that the measured temperature does not exceed a maximum temperature at which heat dissipation may damage the SOC. For example, a power management algorithm may increase the operating frequency of the SOC until the temperature measured by one or more temperature sensors approaches the maximum temperature. The power management algorithm may then maintain or decrease the operating frequency of the SOC to prevent the temperature from exceeding the maximum temperature.
The thermal density or the thermal sensitivity of a location on a substrate may depend on the workload or workloads being executed on the substrate. For example, the thermal densities of a pair of compute units may be relatively high if they are independently processing two high-power workloads because there is no resource contention between the workloads being processed on the different compute units and they are able to retire instructions at a high rate. The temperatures of the compute units may therefore increase while processing the high-power workloads due to the relatively high heat dissipation, potentially leading to thermal throttling of the workloads, e.g., by reducing the operating frequency or operating voltage.