Maintenance and support for systems such as data storage systems (e.g., storage array system) often requires human observation of the state of system resources such as central processing unit (CPU) usage, memory foot print, network traffic, system temperature, solid-state disk (SSD) wear, hard disk drive (HDD) wear, and other system components and conditions. Resolution of anomalous conditions requires human intervention, and this intervention effort can range from fairly simple steps to very involved and complicated processes.
Even with the processes that involve only simple steps, simple mistakes in carrying out the processes can lead to expensive downtime for the system and, in the worst cases, can lead to customer data loss. This intervention effort starts with awareness that there is an anomalous condition with the storage array that adversely affects its ability to accomplish its primary functions. The current state of the storage array's ability to accomplish its primary functions is referred to as its “system health.” Existing techniques for monitoring system health, particularly in the case of storage array systems, pose many challenges.