1. The Field of the Invention
This invention relates generally to the field of high speed communications systems and networks. More particularly, embodiments of the present invention relate to systems and methods for the analysis and processing of network traffic attributes, such as storage I/O transaction specific attributes in a Storage Area Network (SAN).
2. The Relevant Technology
Computer and data communications networks continue to proliferate due to declining costs, increasing performance of computer and networking equipment, and increasing demand for communication bandwidth. Communications networks—including wide area networks (“WANs”) and local area networks (“LANs”)—allow increased productivity and utilization of distributed computers or stations through the sharing of resources, the transfer of voice and data, and the processing of voice, data and related information at the most efficient locations. Moreover, as organizations have recognized the economic benefits of using communications networks, network applications such as electronic mail, remote host access, distributed databases and digital voice communications are increasingly used as a means to increase user productivity. This increased demand, together with the growing number of distributed computing resources, has resulted in a rapid expansion of the number of installed networks.
As the demand for networks has grown, network technology has grown to include many different communication protocols and physical configurations. Examples include Ethernet, Token Ring, Fiber Distributed Data Interface (“FDDI”), Fibre Channel, and InfiniBand networks. These and the many other types of networks that have been developed typically utilize different cabling systems, different communication protocols, provide different bandwidths, and typically transmit data at different speeds.
The recent availability of low cost, high speed network technologies has also given rise to new uses of networks in specific application areas. One such example is referred to as a Storage Area Network (SAN). A SAN is a network whose primary purpose is the transfer of data between computer systems and data storage devices, and among data storage devices, based on protocols developed specifically for data storage operations. SANs have expanded from simple replacements for directly connected storage I/O buses that connect small numbers of storage devices across short distances to a single computer, to complex switched networks that connect hundreds or thousands of computing and data storage end-devices. Moreover, SANs are now being connected to even larger scale metropolitan area networks (MANs) and wide area networks (WANs) that extend the distances of SAN networks into thousands of miles.
High speed networks can be very efficient at utilizing the network bandwidth to transfer large amounts of data. However, as such networks grow in size, and become more complex and diverse, they become more difficult to manage, monitor and maintain at optimum operating levels. Therefore, as networks continue to grow in size and speed, it is becoming increasingly important to provide network administrators and integrators with improved monitoring and analysis capabilities. For example, the growing complexity and spanned distances of SAN networks can adversely impact the performance experienced by the individual storage I/O operations that transit the network. Consequently, there is an increased need to be able to monitor the communication performance of the specific storage I/O transactions between the various end-devices to ensure that adequate performance levels are maintained. Similarly, there is a need for the ability to monitor for device errors that may adversely impact overall network efficiency and forewarn of individual device failures.
Unfortunately, existing network monitoring methods and techniques, especially those for a SAN, have not been entirely satisfactory in providing a sufficient level of monitoring, analysis and troubleshooting capabilities. Consequently, network administrators and managers often do not have a sufficient level of information about the operating parameters of a network, and therefore are not able to fully monitor and optimize the network's operation.
For example, conventional methods typically only monitor network traffic at the aggregate flow level for a particular network link. In other words, only the communication throughput is measured at a particular location within the network. Throughput is a measurement of how much data is being sent between two adjacent nodes in the network. Unfortunately, the throughput parameter provides only limited information about the communication performance experienced by individual end-device transactions within the network. For example, in a SAN environment a high throughput does not necessarily mean that each storage I/O request is being transferred in the most expedient manner possible. In a SAN, storage I/O operations are composed of request/response/completion-status sequences of network traffic primitives to form an atomic “Storage I/O Transaction.” Client application performance can be highly impacted by attributes of these individual storage I/O transactions. For example, attributes such as transaction latency and number of retries can be highly relevant in analyzing SAN performance. It has been shown that in a shared storage I/O channel or network environment, I/O transaction latency commonly deteriorates exponentially as aggregate throughput increases. Therefore, a very high traffic throughput is not necessarily an indicator of good network performance because this will often indicate that an undesirable amount of I/O transaction latency is occurring in the system. Since conventional networking monitoring systems do not monitor the individual attributes of storage I/O transactions, they do not provide the requisite level of information for understanding a given network's operational status.
Another drawback of existing methods for monitoring high speed networks—such as a SAN—is the inability to provide a global upgrade to all monitoring devices used by the monitoring system. Network monitoring systems typically use some form of monitoring device (sometimes referred to as a “probe”) that is directly or indirectly inserted into one of the data channels located within the network. This monitor can then obtain information about the data that is being transferred at that particular network location, which is then provided to a single host device. To do so, each monitoring device must be specifically configured to operate in the environment of the network to which it is connected. Thus, when the communication protocol of the high speed network is upgraded, then all of the monitoring devices must also be replaced or the hardware must be upgraded in a like manner. Such a scenario is not uncommon. For example, many network storage devices in a SAN use the SCSI protocol to transfer data. In the future, updated versions of SCSI or even new transfer protocols likely will be adopted, which will require updates to any monitoring devices. The requirement for such a “global” upgrade is time consuming and expensive, and is difficult to provide with conventional monitoring systems.
Existing monitoring systems also suffer from another related problem. In particular, conventional monitoring systems do not support monitoring environments having multiple probes that are monitoring different network protocols. Instead, each of the probes must be configured for a single particular protocol type. This greatly reduces the ability to accurately monitor the operating conditions of a typical network implementation. For example, today's SANs are increasingly implemented as inter-connections of network segments that are using different protocols. As such, it is important to be able to compare traffic activity on a segment using one protocol, with traffic activity on another segment that is using a different protocol (e.g., Fibre Channel on one segment and Ethernet on another segment).
Conventional systems suffer from a closely related problem as well. In particular, conventional monitoring systems require that each monitoring device be configured in the same way as all other monitoring devices in the system so that the information being obtained is provided in a uniform and consistent manner to the monitoring system. Thus, when the software version of one device is upgraded, then all of the monitoring devices must also be upgraded in a like manner. For example, when a new network monitoring device is introduced that contains a different software version, the other monitoring devices must also be globally upgraded so as to insure proper operation. Again, such global upgrades of this sort are demanding of time and resources and are expensive.
The use of network monitoring devices gives rise to other problems as well. For example, as networks—such as SANs—increase in size, it is becoming more important to record network information at multiple locations within the network to accurately monitor the network operating status. To do so, multiple monitoring devices must be connected throughout the network at various locations. To fully utilize this multiple probe system, the data provided by each probe should be comparable to the data provided from the other probes. This introduces a new obstacle of synchronizing the multiple data sources in order to accurately compare and contrast the efficiency of different parts of the network at a given point in time. Many new networks, such as optical networks, operate at a very high rates of speed. Thus, a network monitoring synchronization scheme must have the capacity to keep up with each network monitor to prevent individual monitors from drifting further and further out of synchronization from the others. In addition, the network monitoring synchronization scheme must continuously ensure that all of the network monitoring data from one monitoring device is properly synchronized with the network monitoring data from the other monitoring devices.
High network speeds create other problems as well. In particular, as networking speeds continue to advance, it becomes more and more difficult for networking monitoring devices to obtain relevant network information in “real time.” In particular, existing monitoring devices are not able to monitor Gigabit-rate network traffic at full line rate speeds. Consequently, network “conditions” can often be missed, and such systems are thus unable to monitor all network characteristics accurately.
Therefore, there is a need for an improved network monitoring system. Preferably, the monitoring system should be capable of obtaining the attributes of individual network transactions occurring within the network, such as attributes of I/O transactions occurring within a SAN continuously and in real time at full line rate speeds. In this way, the system would be able to monitor network operating parameters other than just raw throughput, and instead could identify and monitor the individual device transactions within the data communications network so that errors and latencies can be accurately monitored and attributed to the individual devices involved. In addition, an improved monitoring system should be capable of monitoring transaction level communications from multiple locations within the network, and in a manner so that the data obtained from the multiple locations is synchronized in time. Further, it would be an advancement in the art to provide a network monitoring system that is capable of receiving and processing data received from multiple heterogeneous monitoring devices—e.g., different types of monitoring devices, that are monitoring different protocols or running different software versions, etc. The system should optimize the amount of computational resources used to obtain attributes of network transactions, and in a manner that does not sacrifice the available number of network characteristics that can be viewed by a user, such as a network manager. Further, it would be an improvement if the network monitoring system could operate in the context of very high speed networks—including gigabit operating speeds.