1. Field of the Invention
The embodiments of the invention generally relate to computer systems, and, more particularly, to problem detection and determination in a system.
2. Description of the Related Art
The objective of system management is to ensure that the behavior of the system converges towards certain user-defined goals. These goals are generally in terms of performance, availability, and security. In order to realize this objective, the management framework has two key components: (1) the Problem Determination (PD) component that is responsible for analyzing the system and shortlisting possible reasons for goal violations; (2) a Corrective Actions Engine (CAE) to determine the action(s) to be invoked in response to the goal violation. The term actions could refer to either changing the value of tunable parameters (e.g. prefetch-size, update interval, number of threads) or resource reallocation such as migration, replication, splitting.
An analogy to the management framework is that of a medical doctor; the doctor needs to first diagnose the disease (PD) and then select the “optimal” medicine(s) (CAE) among all the ones that he knows about. Among the two components of a management framework, the development of CAE has been an active area of research with existing approaches classified as either rule-based or model-based. System management is not limited to taking actions after the goals are violated, but also being proactive in interpreting abnormal system behavior and taking the necessary corrective actions before any violations occur.
Problem detection and determination is considered to be complex and domain-specific. That is, it is possible to collect a plethora of information about the system activity using generic, domain-independent techniques, but is highly non-trivial to make sense of the collection information and determine what led to the violation of goals. The conventional solutions provide an interface between the CAE and the PD that is event-based where the events signify the nature of the problem within the system. However, in real-world system management scenarios, problem determination is not a definitive prescription of how the system should be adapted, but rather a collection of “strategies” that the CAE can analyze for feasibility and optimization. An analogy is that of Preference Engineering in databases where the user expresses a desire to “find a hotel that is less than one mile from the beach and under 300 dollars in cost.” Such a result may not really exist, but it expresses the intuition of what the user is looking for.
Conventional problem determination frameworks in computer storage systems generally can be divided into two parts: (1) Frameworks to collect information about the system activity, where depending on how information is acquired, these frameworks are classified as intrusive, semi-intrusive, and non-intrusive (also called black-box); and (2) Information models to analyze the gathered information, where the existing paradigms for specifying these models are generally based on rule, cook-book, data-mining, etc. The semantics of information models are domain-specific. Much of the existing research in the industry is based on web-services, java frameworks, etc. However, there generally does not exist a simplistic problem determination framework in the domain of computer storage systems. Accordingly, there remains a need for such a framework.