The present invention is in the field of computer networks. In particular, but not by way of limitation, the present invention relates to a method and apparatus for reducing latency associated with monitoring the performance of nodes organized in a network that allows multicasting. By way of example, the present invention is directed to a System Area Network (SAN) that is compliant with Intelligent Input/Output (I2O) architectures.
With the proliferation of high performance computer workstations in virtually every workplace and the increased demand for interconnectivity, computer networks have experienced corresponding growth. Computer networks are a driving force in increasing workplace productivity by allowing resources to be shared efficiently among multiple users and allowing alternate or backup resources to be used when other resources fail or become congested with traffic. Networks further and facilitate the efficient transfer of large amounts of data between network nodes depending on dynamic traffic conditions and node health. As networks become more complex and greater numbers of elements are added and serviced by individual network servers, the factors which impact the efficiency of data transfer therefore increase in number. Moreover, networks of networks are becoming a more common part of the networking environment leading to ever increasing degrees of complexity for individual network servers to manage.
Along with data transfer efficiency, critical network management functions such as performance monitoring may be compromised by increasing demand for bandwidth and a shift to more data-driven computing. Driven by factors including increases in processor speeds, increasing demand for open architecture designs, and I/O bottlenecks created by bus bandwidth limitations and non standard interfaces between device drivers and operating systems, a standardized I/O architecture specification (called Intelligent Input/Output architecture) has been developed by an industry group known as the I2O Special Interest Group (SIG). The I2O specification includes, among other things, models for creating device and operating-system-independent network communications.
Because the teachings of the present invention may be better exemplified in relation to the I2O architecture, a brief overview thereof is provided hereinbelow. Essentially, the I2O architecture uses a xe2x80x9csplit driverxe2x80x9d model wherein a messaging layer is inserted for dividing a single device driver into two separate modulesxe2x80x94an Operating System Service Module (OSM) and a Downloadable Driver Module (DDM). The OSM comprises the portion of the device driver that is specific to the operating system. The OSM interfaces with the operating system of the computer system, which may also be referred to in the art as the xe2x80x9chost operating systemxe2x80x9d, and is executed by the host CPU or processor. Typically, a single OSM may be used to service a specific class of peripherals or adapters. For example, one OSM would be used to service all block storage devices, such as hard disk drives and CD-ROM drives. As described, in the split driver model, the DDM provides an interface between the specific device and the OSM. The DDM includes the peripheral-specific portion of the device driver that understands how to interface to the particular peripheral hardware, while providing support for standard calls to the devices of a device class by the operating system by way of the OSM. To execute the DDM, an I2O Input/Output Processor (IOP) is added to the computer system. A single IOP may be associated with multiple peripherals, each controlled by a particular DDM, and containing its own operating system such as, for example, the I2O Real-Time Operating System (iRTOS). The DDM directly controls the peripheral, and is executed by the IOP under the management of the iRTOS.
A DDM may typically include a Hardware Device Module (HDM) that directly interfaces with the peripheral and is responsible for general device control and for managing data transfer to and from the device. A DDM may also include an Intermediate Service Module (ISM) which is an additional software interface to the HDM. Thus the ISM may typically form a custom layer between the OSM and HDM that generally resides on the IOP. In the I2O specification, the ISM is called out to allow for any special purpose processing that is desired which falls outside of standard OSM to DDM messaging.
A system which is compliant with the I2O specification uses a message passing model in general operation. When the CPU seeks to read or write to an adapter or peripheral in an I2O system, the host operating system makes what is known as a xe2x80x9crequestxe2x80x9d. The OSM translates the request by the host operating system and, in turn, generates a message. The OSM sends the message across the messaging layer to the DDM associated with the peripheral which processes it appropriately and responds according to the contents of the message. If a special purpose ISM is present, the ISM may process the message prior to the message being passed to the DDM. Upon completion of whatever action the received message specifies, the DDM responds to the OSM by sending an appropriate response message through the messaging layer. Actions may include, but are not limited to, performing a read or write operation performing a data transfer, or reporting device status. The response may include an acknowledgment that the action was performed, the status of the action underway, an error message and the like. By executing the DDM and the ISM if included, on the IOP, time-consuming information transfers to and from the peripheral hardware are off-loaded from the CPU of the server to the IOP. By off-loading I/O processing to the IOP, the server CPU is no longer diverted for inordinate amounts of time during an I/O transaction. Moreover, because the IOP is dedicated to processing I/O transactions, data transfers are carried out more efficiently and faster.
In current implementations of the I2O specifications, once a typical I/O device is configured the I/O device typically receives only a small subset of message types which typically involve relatively simple data move operations. While the I2O specification guides the compatibility of systems and devices in a diverse product market, it is important to note that systems may be I2O compatible yet provide features which better accomplish the goals set forth as the motivation behind I2O, that is, greater I/O independence and data transfer capacity and processor unburdening. Moreover, it is possible to achieve the goals of greater independent I/O data transfer capacity in a system which is not strictly I2O compliant.
Another solution for relieving network bottlenecks and achieving scalability is to provide a clustered network environment wherein a variety of components like servers, disk drives, tape drives, etc., are integrated into a system-wide architecture such as a System Area Network (SAN). SAN architectures, for example, a fabric network, provide a low latency interconnect between servers and devices and can be configured for I2O compliance. SAN architecture is based on message passing between servers and devices. SAN technology employs the server processor to process data transfer requests between network elements and then allow data transfers to occur under control of dedicated hardware thus reducing server processor overhead to a minimum. In a SAN architecture, a network transport layer may be implemented on a dedicated hardware platform, typically an I/O processor (IOP), which allows a processor to be connected to a scalable switching fabric. A SAN server can then be expanded to add data paths which effectively increase the overall bandwidth of the switching fabric by increasing the number of point-to-point datapaths which can be used to carry data between nodes. Thus, large numbers of nodes which may be clients, other servers, or other network devices such as disk towers, and the like may be controlled by a server. Further, to off-load the processing of data transfers from the server processor, peer-to-peer communications may be set up between devices and the transfers may proceed without further server intervention.
In order to properly manage the SAN and set up peer-to-peer transfers between devices, a server must be aware of the status of the devices within its area or cluster by monitoring the status of network devices. Performance monitoring involves sending periodic status request messages to individual devices and then receiving status request response messages from the devices. Then, as requests are made to the server to set up data transfers between, for example, a healthy device and a device known by the server to be unhealthy, an appropriate error message may be issued or alternative action may be taken by the server. As the number of network elements grows however, the need to conduct network performance monitoring increases correspondingly.
Further, the use of more components in a given SAN cluster not only increases the need to monitor the health of individual components that constitute the SAN but also the health of the cluster (e.g., a fabric) itself to ensure optimum performance. For example, link availability and network latency may require monitoring to select the best data path through the network throughput also may affect route selection. Performance monitoring may also be used to determine availability, throughput and job backlog of print queues. Event logging and alarm generations allowing for analysis of network problems may also be performed by a monitoring server. A problem arises however when a SAN server is used as a performance monitoring node. Since a key advantage of SAN technology is the reduction of processing latency involving data transfers by limiting messages originated in the server processor to I/O transaction setups only, performance monitoring using point-to-point messaging to each network element in a large SAN cluster would overload the server both during the issuance of outbound message packets and during the period when responses from devices are received. As can be readily appreciated, such overloads give rise to unacceptable latencies.
In addition, traffic overloads during performance monitoring may be particularly acute for servers in SAN clusters where the number of nodes is large and the number of devices served by each node is large. In a point-to-point messaging scheme, a status request message must be generated for each device in the SAN cluster. In some cases, the number of devices could be in the thousands. As the thousands of status request messages are issued for a single periodic status check of the SAN cluster, the performance monitoring server and the associated SAN fabric quickly become overloaded with a flood of outbound status request message traffic. Status request messages issued concurrently in a point-to-point scheme may arrive at their destinations at virtually the same time. Devices prepare status response messages and issue them correspondingly close in time resulting in a simultaneous flood of inbound message traffic to the monitoring server and the SAN fabric. Since the issuance of the outbound request messages and the inbound response messages occur close in time, the server may be overloaded for an inordinate amount of time. Such an overload on a critical server leads to processing delays for important tasks or may result in missing the processing of, for example, an important real time event.
It would be advantageous, therefore, to devise a method and apparatus for performance monitoring which avoids compromising the low latency afforded by SAN technology. Such a method and apparatus would avoid traffic overloads so described and allow a SAN server to be available for low latency processing at virtually all times.
The present invention therefore uses a multicast ISM to receive a status request message from a performance monitoring OSM and to issue multicast status request messages for the entire SAN cluster. A first governor IOP at a first node receives a status request message from a performance monitoring OSM at the first node. A first multicast ISM disposed within the first governor IOP generates status request messages for devices and IOPs local to the first node and propagates the status request message for devices and IOPs local to the first node and propagates the status request message to a second governor IOP at a second adjacent node. The second governor IOP receives the status request message from the first governor IOP. A second multicast ISM disposed within the second governor IOP generates status request messages for devices and IOPs local to the second node.
If additional nodes are present, each governor IOP, in addition to generating status request messages for local devices and IOPs, propagates a status request message to an adjacent governor IOP. The additional adjacent governor IOP generates status request messages for devices and IOPs local to its node and further propagates the status request message to the governor IOP of an additional adjacent node, if present, and so on, throughout the SAN because of the multicast nature of the issuance of status request messages. In one aspect, devices associated with nodes near the performance monitoring OSM can begin responding to local governor IOPs as status request messages are still being propagated. Accordingly, outbound and inbound response processing bottlenecks are minimized.
As status request messages are received by devices at a node, a status request response message is generated and sent by each device to the governor IOP for the node. Each governor IOP responds to the adjacent upstream IOP and ultimately the governor IOP for the performance monitoring node reports the SAN status to the performance monitoring OSM in one of two modes: xe2x80x9chealthyxe2x80x9d and xe2x80x9cunhealthy.xe2x80x9dA xe2x80x9chealthyxe2x80x9d response from a governor IOP indicates that all devices local to the governor IOP are in their specified, preferably error-free operating condition. An xe2x80x9cunhealthyxe2x80x9d response indicates that one or more devices are malfunctioning. If the SAN is healthy, the response message includes an xe2x80x9call finexe2x80x9d indication along with the TID of the governor IOP local to the performance monitoring OSM. If one or more devices are ailing, a response message for each unhealthy device, including a TID for such device may be sent to the performance monitoring OSM. In one embodiment of the present invention, the performance monitoring OSM may then establish point-to-point communication with each unhealthy device to request detailed status information. Upon receipt by the ailing device of the detailed status request message, a detailed response from each unhealthy device containing additional information about the device status, may be sent directly to the performance monitoring OSM.