As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
Groups of information handling systems are often arranged in cluster configurations. In some clusters (e.g., an ORACLE cluster), a group of nodes may be connected to a shared storage system such that the nodes may store data in, and retrieve data from, the storage system using I/O (input/output) commands. Such configuration may be referred to as a shared storage configuration. There are two basic types of I/O commands that a node can send to the storage system: (1) read commands to retrieve data from the storage system and (2) write commands to write data to the storage system.
In some configurations, if a node sends one or more I/O commands to the storage system and does not receive a notification of completion within some specified time, the associated operating system (OS) will send a reset instruction (e.g., a bus reset, a target reset, or a LUN reset) to the storage system to reset the storage system (or at least a portion of the storage system) and/or to retrieve the timed-out I/O command(s) from the storage system.
However, in a shared storage configuration (e.g., a shared storage cluster or a storage consolidation solution), the reset initiated by one node may cause I/O commands sent by other nodes and queued in the storage system to be aborted or erased from their respective queues. In some environments (e.g., in a Fibre Channel or Serial Attached SCSI (SAS) environment) and in some situations (e.g., during heavy loading situations where nodes have many outstanding I/O commands), the other nodes may not be aware that the storage system has been reset and their I/O commands have been aborted. As a result, the nodes may time out those aborted I/O commands and may send their own reset instructions to the storage system, which may negatively affect the cluster or configuration.
For example, if node A initiates a first reset at the storage system, the storage system may abort all queued I/O commands, including one or more I/O commands sent from node B. Node B may be unaware of the storage system, and may continue waiting for a response to a particular I/O command that was queued in the storage system and aborted during the reset. After a timer expires and no response has been received at node B from the storage system, node B may send its own reset to the storage system. In this manner, a series of resets may be triggered, which may bring down or make unstable the entire cluster or configuration (or a portion thereof), which may be inefficient, expensive, and/or may lead to other system problems.