At system level, dynamic voltage and frequency scaling (DFVS) has been utilized to explore an optimal tradeoff between performance and power. In traditional DFVS, highly efficient switching voltage regulators are deployed on the board shared among multiple chips in order to reduce the silicon costs of electronic components. The traditional switching regulator, buck regulator or switched capacitor regulators normally operate at a switching frequency of several hundreds of kHz to a few MHz limiting its response time to microseconds. As a result, previous DVFS scheme is only controlled at system level with coarsely defined power states and thus not capable of performing DVFS down at program level with fine granularity. In recent years, the new trend of integrating numerous on-chip regulators for multi-core processors provide flexibility for energy optimization. For example, 48 fast response (sub-ns) regulators with 2 regulators for each logic core and cache were deployed in the 12 cores of IBM Power 8 processor to achieve fast DVFS. Meanwhile, efficient on-chip switching regulator has been demonstrated with high configurability and fast response within 2-3 ns or even sub-ns. Such a fine grid on-chip voltage scaling capability introduces opportunities for low power electronic design. For example, a physical model and optimization methodology for on-chip switched capacitor regulator was developed to optimize the deployment of on-chip regulators for higher energy efficiency. An ultra-dynamic scheme was proposed to change supply voltage in a multi-Vdd configuration using different power switches, which allows the supply voltage to switch within a few nanoseconds leading to enhanced flexibility for DVFS. However, that scheme requires generation and routing of multiple supply voltages to the digital logic and generates large design overhead. While a majority of current energy optimization methodology for power management has remained at system level, a few previous works also explored architecture and circuit level co-optimization based on sophisticated insight into software programs. For example, a previous study shows that significant amount of resonant noise can be removed if the existence of critical instructions can be predicted in a pipeline leading to 10% performance improvement. A Razor based scheme was proposed to reduce timing error rate based on instruction type leading to 80% performance penalty reduction from timing error recovery in Razor technique.