Data centers include multiple components that operate together to provide computing services to clients on request. For example, data centers may include hosts (e.g., server computer systems), which implement software applications that receive requests sent from client computer systems. During the course of processing the client requests, the applications may generate input/output (I/O) transactions including I/O transactions to read data needed to generate responses to the requests. I/O transactions can be transmitted to other components within the data center such as disk arrays via networking equipment that may include switches, routers, bridges, etc.
The performance of applications is an important aspect of data centers. One metric of performance is the time it takes applications to respond to client requests. Short response times are desired. The performance of applications is dependent upon the performance of supporting components within the data center such as switches, disk arrays, etc. Abnormal behavior in the supporting components or application itself, will likely degrade the ability of the application to respond quickly to requests received from client computer systems.
Abnormal behavior may be the result of anyone of many different types of hardware or software problems. A bad disk in RAID storage may lead to slower response times. A switch or a host bust adapter (HBA) port may fail or be slowed by some type of hardware or software failure to the point where little if no data can be transmitted by the switch or HBA port. A cable used to connect a switch to a storage array or server may have deteriorated.
Software or hardware problems with components or connections between components in a data center can lead to an increase in the time it takes for an application to respond to a client request or outright failure to fulfill the client request. If the abnormal behavior within a component such as a switch is detected early, data center administrators can take proactive action to quickly remedy the problem early and avoid performance degradation or function down time when addressing the problem and replacing faulty components at the cost of system and business