CIM is an object model used to represent managed systems with a common set of objects and relationships, and is maintained by the Distributed Management Task Force (DMTF). CIM agents exist in the art which manage interactions with storage elements and the data within these storage elements. For example, the role of the IBM® DS8000/DS6000 storage system CIM agent is to maintain a global space (or a set of spaces) of CIM data representing the configuration, capabilities, and services of its storage devices. The space of CIM data is then presented to CIM clients. In the DS8000/6000 CIM agent implementation, the CIM agent is designed to support many requests from many CIM clients against many devices. During normal operation, the CIM agent is servicing client requests in addition to maintaining worker threads to the devices it is managing.
In a typical storage system CIM agent implementation, all operations related to a device (worker thread activity, client requests, performance statistics polling, and the like) are contained in a single service. These operations are hidden from CIM clients because the CIM agent presents a single global service that contains all services for all devices. This is because the CIM agent does not know what portion of the total dataset a CIM client will request, as it may be a subset or it may be all of the data. A single global service containing all data and capabilities of all devices managed is representative of the CIM data presented to CIM clients. For example, FIG. 1 represents what an example total dataset of CIM data 100 might look like in a storage system existing in the prior art.
What the CIM agent must do is maintain a logical translation between CIM data and the device the data is populated from. FIG. 2 further illustrates how different pieces of CIM data may come from different devices 211, 212, 213 and the logical mappings 221, 222, 223 the CIM agent must maintain within an example storage system existing in the prior art. As mentioned above, the CIM agent does not know what portion of the data that a client may request, because the CIM protocol allows a client to ask for many subsets of the dataset. For example, a client may request all volume data. In the example depicted in FIG. 2, a request for all volume data would require the CIM agent to access all three devices 211, 212, 213.
The problem occurs when a device fails or partially fails. In this case, a subset of the CIM data is now inaccessible or unusable. FIG. 3 illustrates how the total failure of a single device 213 would affect the CIM Data in a system existing in the prior art. If a CIM client requested volume data from the CIM agent in this state, the request would either fail or take longer than expected, because the client happened to request a subset of the CIM data that included CIM data from the defective device.
FIG. 4 similarly illustrates a single component within a storage device (an array 401) failing and how this failure would affect the CIM data in a system existing in the prior art. As depicted in FIG. 4, a CIM client attempting to request a CIM dataset that contains the defective CIM data will have this request fail.
Because of the nature of storage devices, when failures occur, there may be long latencies involved. Timeout conditions can take up to 15 or even 20 minutes to elapse. Once a failed device is discovered, it is undesirable to allow multiple requests to a failed Storage Device to continue. Thus, all CIM client requests will suffer latency times because the CIM agent will continue to attempt to collect data from the defective device.
In the volume example depicted in FIG. 3, every client request for data from volumes would either not be serviced or take longer than the client is willing to wait. During normal operations multiple CIM client requests are accessing various subsets of the CIM data. If the device fails, all CIM client requests that happen to request some part of the affected CIM data will be adversely affected. To a CIM client it would appear that random requests to the CIM agent seem to take longer than expected or would unexpectedly fail. In addition to the impact to CIM clients, the CIM agent also wastes cycles attempting to perform operations on a defective device (worker threads, servicing CIM client requests, etc.), slowing all operations in the CIM agent down. This entire problem is exacerbated because typically the set of CIM data is large and complex, and a single device often represents a significant portion of this data.
There is no known mechanism to automatically manage the set of CIM data that is affected by a defective device. The only option is to manually de-configure the defective device from the CIM agent. There is no way for the CIM client to know what portion of the CIM data is causing the failures and to avoid requesting it. Further, after the device is repaired, manual intervention is required to re-configure the device into the CIM agent.