With increasing integration of capabilities into mobile application processors, a host of imaging operations that were earlier performed in software is now implemented in hardware. Though imaging applications are inherently error resilient, the complexity of such designs has increased over time and thus identifying logic that can be leveraged for energy quality trade-offs has become difficult.
Continued technology scaling, as promised by Moore's law, along with increasing power consumption of applications has made conserving battery life a crucial aspect for multimedia applications targeting mobile processors. The fact that this class of applications is inherently error-resilient is yielding the way to stochastic computing techniques, wherein the accuracy of the output is traded for significant power savings. An essential aspect of stochastic computing lies in accurately identifying functionally non-critical logic within an ASIC design which can be leveraged for significant energy-quality trade-offs. Current day system on chips (SoCs) employ a large variety of user enabled/disabled features in order to cater to a wider audience. Identifying functional criticality for such design and harboring various modes of configuration is not an easy task.
Several prior approaches have investigated techniques for identifying functionally significant/critical logic and power reduction for digital signal processing (DSP) systems. One prior approach identifies the impact of each flip flop on the output quality and categorizes them into various criticality bins. This framework has a simulation penalty which is linear in the number of flip flops of the design, making it unsuitable for larger ASICs, which harbor tens of thousands of flip flops. Another technique in approximate computing prunes away circuit elements which have the least probability of being active. Though this framework is demonstrated only on arithmetic circuits, a significant shortcoming of this approach lies in the huge penalty of gate-level simulations that need to be performed. The resulting design also suffers from the inability to work in exact mode of computation.
Another approach proposes a slack redistribution methodology to allow a graceful degradation in quality when subjected to voltage scaling. To avoid the penalty of gate-level simulations, this approach uses a toggle-rate based error metric which treats all errors with equal weightage. The resulting design suffers from an increased area overhead of the design due to cell-swaps used to re-distribute slack. In yet another technique, a significance driven computation strategy through cross-layer optimizations at algorithmic and hardware levels is implemented. The hardware design is manually restructured to ensure faster significant computations and slower non-significant computations. The optimizations in this technique are highly specific to the design under test and cannot be generalized.
Voltage scaling has been employed as a major technique for power reductions in the aforementioned prior approaches. A serious impediment to these techniques is the overhead of routing three or more voltage planes and the necessity of added level shifters at power-domain crossings resulting in using recovery circuits to recover errors due to voltage scaling. This also leads to an extra area and power overhead from the added recovery circuits.