Networks and distributed storage allow data to be shared between devices located anywhere a connection is available. Improvements in capacity and network speeds have enabled a move away from locally attached storage devices and towards centralized storage repositories such as cloud-based data storage. These storage systems may be scalable and may range from a single shared folder to a cluster of file servers attached to and controlling racks of disk arrays. These centralized offerings are delivering the promised advantages of security, worldwide accessibility, and data redundancy. To provide these services, storage systems may incorporate Network Attached Storage (NAS) devices, Storage Area Network (SAN) devices, and other configurations of storage elements and controllers in order to provide data and manage its flow. Improvements in distributed storage have given rise to a cycle where applications demand increasing amounts of data delivered with reduced latency, greater reliability, and greater throughput. Building out a storage architecture to meet these expectations enables the next generation of applications, which is expected to bring even greater demand. NetApp storage systems offer NAS and SAN capabilities and support a wide range of standards.
To detect bottlenecks and overcapacity, conventional storage systems monitor and report performance data obtained from their various components. This performance data may be used for performance monitoring, optimization, planning, and troubleshooting. In one example system that includes a logical storage volume that stores data to a multitude of underlying physical storage devices, the performance of each storage device affects the overall performance of the logical storage volume. Accordingly, a storage controller coupled to the storage devices has a performance reporting function that samples performance data of the logical volume and archives (saves) the performance data and/or transmits it to one or more analytical programs. Archiving the data allows a-posteriori diagnostic of customer performance issues by support teams without having to artificially reproduce the event. When archiving performance data, the storage controller samples the performance data of the logical volume (including, among other things, performance data of the underlying physical storage drives) at preconfigured intervals triggered by a system clock, e.g., from once every second to once a week in the current embodiment.
The processing resources used to obtain and report the sampled performance data are considered overhead because they are temporarily unavailable to perform the primary functions of serving storage operations. As the number of tracked objects in a storage system increases, the amount of data sampled and transferred also increases, thereby increasing the amount of overhead. This means that as a storage system grows and more objects (including both tracked hardware components and tracked software components) are included in the system, the overhead of performance data collection grows, which may adversely impact the actual performance of the system. In an exemplary embodiment, a storage system tracks over 300 objects, each of which may in turn track up to hundreds of thousand instances. Longer term, object growth is expected to continue, making the current approach untenable.
Improvements that reduce system overhead, including the overhead of performance monitoring, free up resources to handle storage operations, thereby allowing the storage system to wring more performance from the same hardware. For these reasons and others, improved systems and techniques for performance monitoring are important to the next generation of information storage system.