1. Field of the Invention
The present invention relates generally to methods and systems for monitoring data storage networks, and more particularly, to a computer-based method and system that utilizes histogram techniques for collecting performance metrics for components of a data network, such as switches, for compressing the collected data to enable maintenance of historical data while substantially retaining measured peaks and valleys in the data (or highs and lows), and for displaying the performance metrics on a single screen or interface that enables network administrators to efficiently monitor network performance in an ongoing and historical manner.
2. Relevant Background
People familiar with the data storage industry realize that data storage networks, including storage area networks (SANs), hold the promise of increasing the availability of data and increasing data access efficiencies and effectiveness while also reducing information technology costs. Generally, a data storage network is a network of interconnected computers, data storage devices, and the interconnection infrastructure that allows data transfer, e.g., optical fibers and wires that allow data to be transmitted and received from a network device along with switches, routers, hubs, and the like for directing data in the network. For example, a typical SAN may utilize an interconnect infrastructure based on Fibre Channel standards that includes connecting cables each with a pair of 1 or 2 gigabit per second capacity optical fibers for transmitting and for receiving data and switches with multiple ports connected to the fibers and processors and applications for managing operation of the switch. SANs also include servers, such as servers running client applications including data base managers and the like, and storage devices that are linked by the interconnect infrastructure. SANs allow data storage and data paths to be shared with all of the data being available to all of the servers and other networked components.
Despite the significant improvements in data storage provided by data storage networks, performance can become degraded in a number of ways. For example, performance may suffer when a network is deployed with few data paths to a storage device relative to the amount of data traffic. Also, performance may be degraded when a data path includes devices, such as switches, connecting cable or fiber, and the like, that are mismatched in terms of throughput capabilities, as performance is reduced to that of the lowest performing device. Further, even if the data paths and devices were originally planned to optimize the bandwidth of each critical data path and of device capabilities within the data paths, changes in usage patterns, such as archiving of data and deployment of new applications, and in network devices may significantly alter performance of the network.
While many performance metrics are measured in a network, an exemplary measurement of performance is utilization, which is typically determined by comparing the throughput capacity of a port of a network device or a data path with the actual or measured throughput at a particular time, e.g., 1.5 gigabits per second measured throughput in a 2 gigabit per second fiber is 75 percent utilization. Hence, an ongoing and challenging task facing network administrators is managing a network so as to avoid underutilization (i.e., wasted throughput capacity) and also to avoid overutilization (i.e., saturization of the capacity of a data path or network device). To properly manage and tune network performance including utilization, monitoring tools are needed for providing performance information for an entire network to a network administrator in a timely and useful manner.
With present monitoring tools, metric information such as utilization of a switch or traffic on a data path is collected and stored. A user interface may then be used to display real time data as it is collected. A graph may show a metric relative to time as the data is being gathered. However, because the administrator cannot view the screen continuously, it is likely that the administrator will not be able to identify problems within the network, such as saturation or underutilization of a portion of the network. Some monitoring tools allow thresholds to be set to provide alarm messages when the monitored network parameter or meter exceeds a minimum or maximum value. However, this only provides information on discrete peaks and/or valleys of performance information but does not provide useful trending or historical information.
Network administrators generally demand that monitoring tools provide data collection and reporting that provides historical information that can than be used to identify ongoing or periodic performance trends. For example, an administrator may wish to know that a system or portion of a system was being over utilized repeatedly at a certain time of day which may indicate data backup or some other repeated activity was overloading the systems equipment. Historical data is also useful for trending and tuning a system and for planning for equipment upgrades as trends can be identified such as one portion of a data storage system or network is being used more and more with time, which indicates that an upgrade or tuning may soon be necessary to control saturation problems.
A number of problems are associated with collecting, storing, and accessing historical data. One problem involves the amount of memory that is required for storing collected performance information for a data network. Assuming a single port is being monitored on a 30 second polling schedule, every hour 120 data points would be collected and if each data point required about 80 bytes of memory 9.6 Kbytes would be needed for each port. The problem quickly multiplies as data is collected over days, weeks, and months for hundreds or thousands of ports in a network. Hence, there is a need for reducing the memory capacity required to store historical data on network performance. Some existing tools use averaging of collected data but this often results in important information being hidden from the administrator. Specifically, a data network becomes inefficient if it operates at high over utilization or saturation and/or operates with little utilization or under utilization. However, if a high metric value is averaged with a low metric value, the result is very misleading. For example, a utilization rate of 95 percent averaged with a utilization of 5 percent would indicate utilization of 50 percent. While 50 percent utilization may be acceptable to an administrator, it is doubtful that periods of saturated operation would be acceptable as this would result in reduced efficiency. Other reporting tools simply provide large spreadsheets or reports of historical data, which is also often not useful to an administrator as the data is not correlated and/or is so overwhelming in size that important information is difficult to identify and understand.
Hence, there remains a need for methods and systems for collecting, storing, and reporting real time and historical performance information for data storage networks to network administrators. Such as system preferably would be useful for viewing information on a standard monitor screen such as in a graphical user interface and would be relatively easy to use and understand, i.e., not require significant administrator training. Additionally, such a method and system would preferably retain historical data without losing or hiding high and low values that can be caused by value averaging and would require less memory to store historical information.