Growth of data produced by enterprises and individual consumers, along with ever-increasing types of interaction, transformation and retention of such data due to new social media platforms, end user devices, analytics applications, and regulatory mandates, etc., results in corresponding increases in the complexity of Information Technology (IT) infrastructure used to store such data, and the applications that serve, produce, transform and/or consume such data.
The increasing complexity of IT infrastructure has also been accompanied by an increase in the quantity of devices and resources that comprise the infrastructure. For example, a web site may have a front end load balancing layer of web servers including tens or hundreds of computers, connected together by tens of Ethernet switches. Each of the servers may run web server and other software, including monitoring software, intrusion detection software, etc. Each computer may have multiple Fibre Channel Host Bus Adapters (HBAs), each of which can be connected to one or more edge switches in a Storage Area Network (SAN) Fabric. Edge switches connect to multiple core switches, which then connect (possibly through other edge switches) to one or more storage subsystems. Each storage subsystem can have multiple HBAs, servers, memory caches, internal interconnects, non-volatile memory banks, RAID (redundant array of independent disk) engines, device adaptors and, finally, an assortment of solid state and magnetic disks. Accordingly, the task of managing such numerous and heterogeneous resources is generally highly labor-intensive and requires specialized skills and tools.
One cause of application or service unavailability is infrastructure downtime due to incompatible firmware. Avoiding problems in this area generally requires monitoring implemented system components to detect incompatible software, firmware or hardware versions, and implementing a governance process for resolving said problems by bringing software or hardware or firmware up to date. Efficiently or effectively carrying out such tasks in large systems is challenging, and outages may be difficult or impossible to prevent in prior art systems and methods.