The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also correspond to embodiments of the claimed subject matter.
In computing nomenclature, the term “RAS” stands for Reliability, Availability, and Serviceability. Some computing platforms are intentionally designed with a high level of RAS features which may constitute a balance of hardware and software capabilities to implement or solve the desired RAS features and functionality. Supporting and implementing RAS, however, can be complex and may present difficult design considerations.
For example, as RAS capabilities and features increase, the mechanisms for implementing them have not always been standardized. The lack of standardization leads a lack of agreement between computing components on given platform regarding what information, events, and errors should be detected, and also the manner and mechanisms for reporting such information, events, and errors, and further, what action or behavior, if any, to take upon the discovery of such information, events, and errors.
The lack of standardization additionally leads to confusion and difficulty for firmware designers who must interface their respective functionality to hardware components, which may each have different schemes for reporting and handling RAS related information, events, and errors. For example, where logging is unique to each given hardware component, a unique implementation must be derived to handle the specific implementation. Further still, discovering errors may lack comprehensive support and coverage because over many different implementation schemes, it may not be knowable precisely what to look for in terms of an indication that an error has occurred. Thus, conventional mechanisms have involved querying for errors which introduces additional overhead and computational waste. Distinct queries to each of several different computing components may be necessary as well, in an attempt to discover any potential errors among the different computing components.
There is a need to balance the implementation of RAS capability with efficient processing of workload. Although such RAS features may very well be desirable, they nevertheless represent computing overhead, and thus, displace computational resources which may otherwise be directed toward handling a primary workload computational task. This necessity for querying multiple different computing components distracts from the actual workload to be handled by a limited set of computational resources, and may further slow error recovery.
The present state of the art may therefore benefit from systems and methods for implementing an error framework for a microprocessor and for a system having such a microprocessor as described herein.