With the rise of server-based computing applications and paradigms such as e-commerce marketplaces, cloud-based computing, and online multiplayer gaming, it is not uncommon for large companies and enterprises to have data centers or server fleets with hundreds, if not thousands, of servers or other compute resources.
Monitoring the health of these compute resources and identifying failure points is critical for ensuring acceptable performance levels and for preventing outages. Common monitoring solutions often use complex hardware-based systems installed at data centers or in the rooms housing server fleets. These monitoring systems, however, can be expensive and often are not practical or cost-effective for smaller data centers or server fleets.
Many modern microprocessors such as central processing units (CPUs), graphical processing units (GPUs), and other semiconductor devices are capable of dynamic frequency scaling (or self-throttling), in which the device dynamically adjusts its frequency in response to hardware-measured failures or other marginal conditions relating to, for example, operating temperature or power consumption. These semiconductor devices thus have built-in hardware monitoring components. It would be desirable to leverage these built-in hardware monitoring components to monitor compute resources, data centers, or server fleets without the need for additional monitoring infrastructure.