The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for safe and efficient microprocessor management using guarded, multi-metric resource control.
In the field of microprocessor architectures, predicting certain key events ahead of their actual occurrence is a key problem. Such predictive algorithms are usually applied for the purposes of enhancing net performance through dynamic resource management. Dynamic resource management enhances the efficiency and/or robustness of microprocessor and related data processing system operation. The metrics of interest in quantifying the objective function in such dynamic resource management may be performance, power consumption, temperature, reliability, or the like. One known problem in specifying the architecture of such a workload-driven, dynamic resource manager is the problem that no matter how carefully the dynamic resource management has been designed, there are occasions when the dynamic resource management malfunctions in the sense that the intended benefit is not derived and, in fact, the net effect may indeed turn out to be contrary to the original objective. For example, a feature intended to boost performance might on occasion degrade performance, a feature intended to save power may end up costing more power, or the like. Thus, in some cases, such a dynamic resource manager may actually be the cause of a safety issue, in that, even an occasional violation of intended specifications may cause the system to experience an unplanned outage or even be permanently damaged. A given microprocessor system is also prone to malfunction and fail to meet intended system specifications in the event of a malicious security attack. Since resource management algorithms are not fool-proof, a hacker may be able to deliberately create unsafe workload conditions in order to cause damage or service outage of these computing systems. Power-viruses that test a given microprocessor's thermal limits and cause the microprocessor to overheat advertently are already known to exist.
An existing solution approach is to try and devise a “water-tight” resource management algorithm that never fails to yield the intended benefit, and the design team tries to rely on simulation-based validation or format verification of the robustness of the devised algorithm in the case of simulation-based validation, if there is an isolated workload for which there is a “negative” benefit, the design team may still approve the decision to include the feature into the design. However, this may be both unsatisfactory and unsafe, since the space of all possible workloads may not be determined during the design of the microprocessor system. Thus, when the algorithm is deployed in the field, there may be numerous (not infrequent) unanticipated workload patterns across the many processor cores inside the chip or system that cause the designed algorithm to “malfunction” in the sense described above. In the case of formal verification, the analysis complexity (especially across today's multi/many-core processors) is often too steep, and model abstraction required to address that issue may fail to guarantee safe and efficient operation.
Another method used is a “bang-bang” control system, in which a drastic action to counter a dangerous or unacceptable trend is executed in order to maintain safe functionality. Such control systems usually result in severe degradation of one or more critical figures of merit, when the safeguarding mechanism is engaged. For example, reacting to a thermal emergency in response to a monitored thermal trigger by stopping the processor clock or severely throttling the instruction fetch mechanism usually results in significant loss of performance, A management algorithm may be designed to minimize performance loss, but saving power while safeguarding performance using such methods may be difficult and preventing some corner-case workload from getting severely affected in terms of delivered performance may be virtually impossible.