The rapid expansion of information service and data processing industries has resulted in a need for computer systems to manage and store large amounts of data. As an example, financial service industry businesses such as banks, mutual fund companies or the like often operate large and complex data processing systems that require access to many hundreds of gigabytes or even terabytes of data. Data storage system developers have responded to these types of data storage requirements by integrating large capacity data storage systems, data communications devices and computer systems into networks called “storage networks” or “storage area networks” (SANs.) A storage area network is a collection of data storage systems that are networked with a number of host computer systems that operate as servers to access data stored in the data storage systems. Elements of a typical conventional storage area network implementation include one or more connectivity devices such as high speed data switches or routers that interconnect the various data storage systems to each other and to one or more host or server computer systems (servers) that require access to (e.g., read and/or write) the data in the data storage systems on behalf of client software applications and/or client computer systems.
A developer or administrator of such a storage area network environment may install one or more distributed storage area network management software applications within the storage area network to manage or administer the various elements (i.e., devices, computer systems, storage systems, etc.) that operate within the storage area network. A network manager (i.e., a person) responsible for management of the storage area network operates the network management software application to perform management tasks such as performance monitoring, network analysis and remote configuration and administration of the various components operating within the storage area network.
A typical conventional storage area network management software application may have several different software components that execute independently of each other on different computer systems but that collectively interoperate together to perform network management. As an example, conventional designs of storage area network management applications can include console, server, agent and storage software components.
Generally, the server component operates as a central control process within the storage area network management application and coordinates communication between the console, storage and agent components. The console component often executes within a dedicated storage area network management workstation to allow the network administrator to visualize and remotely control and manage the various elements within the storage area network that are graphically represented within the console. Agent components execute on host computer systems such as servers within the storage area network to manage storage area network elements. As an example, there may be different respective agents specifically designed (e.g., coded) to remotely manage and control data storage systems, database applications, switches, and so forth. Agent components receive remote management commands from the server component and apply functionality associated with those management commands to the managed elements within the storage area network for which those agents are designated to manage. Agents are also responsible for periodically collecting configuration or management data concerning the storage area network elements that those agents are responsible for management. Agents can transmit this collected management data back to a storage component. The storage component receives the collected management data from the agents and processes and stores this information into a storage area network management database for access by the server component. The console component can interact with the server component to obtain current information such as performance, capacity, load or other data concerning managed elements within the storage area network by accessing the element configuration data in the network management database.
Different components of the storage area network may be located remotely from each other. For example, the agents may be located remotely from the server and storage components, and communicate with them over a network. A firewall may be used on the network between the agent and other parts of the system. The firewall is used to provide a measure of security such that the server and storage components are not available (such as by having their IP addresses published) to other people or systems. A firewall may use Network Address Translation (NAT) wherein multiple private IP addresses are converted to a single IP address for communications outside of the local network, onto a public network. The impetus towards increasing use of NAT comes from a number of factors including a shortage of IP addresses, security needs and ease and flexibility of network administration.
A firewall placed between the local devices and the network is used to verify all traffic before allowing it to pass through. This means, for example, that no unauthorized user would be allowed to access the management server or storage processes. The firewall only allows connections that are originated on the inside network. This means, for example, that an internal client can connect to an outside server, but an outside client will not be able to connect to an internal server because it would have to originate the connection, and the firewall will not allow that. It is still possible to make some internal servers available to the outside world via inbound mapping, which maps certain well know TCP ports (e.g. 21 for FTP) to specific internal addresses, thus making services such as FTP or Web available in a controlled manner.
NAT can help network administration in several ways. NAT can be used to divide a large network into several smaller ones. The smaller parts expose only one IP address to the outside, which means that computers can be added or removed, or their addresses changed, without impacting external networks. With inbound mapping, it is even possible to move services (such as Web servers) to a different computer without having to do any changes on external clients.
The basic purpose of NAT is to multiplex traffic from the internal network and present it to an external network (e.g., the Internet) as if it was coming from a single computer having only one IP address. To multiplex several connections to a single destination, client computers label all packets with unique “port numbers”. Each IP packet starts with a header containing the source and destination addresses and port numbers. This combination of numbers defines a single TCP/IP connection. The addresses specify the two machines at each end, and the two port numbers ensure that each connection between this pair of machines can be uniquely identified.
Each separate connection is originated from a unique source port number in the client, and all reply packets from the remote server for this connection contain the same number as their destination port, so that the client can relate them back to its correct connection.
A NAT gateway or firewall changes the source address on every outgoing packet to be its single public address. It also renumbers the source ports to be unique, so that it can keep track of each client connection. The NAT gateway uses a port mapping table to remember how it renumbered the ports for each client's outgoing packets. The port mapping table relates the client's real local IP address and source port plus its translated source port number to a destination address and port. The NAT gateway can therefore reverse the process for returning packets and route them back to the correct clients.
When any remote server responds to a NAT client, incoming packets arriving at the NAT gateway will all have the same destination address, but the destination port number will be the unique source port number that was assigned by the NAT. The NAT gateway looks in its port mapping table to determine which “real” client address and port number a packet is destined for, and replaces these numbers before passing the packet on to the local client.
This process is completely dynamic. When a packet is received from an internal client, NAT looks for the matching source address and port in the port mapping table. If the entry is not found, a new one is created, and a new mapping port allocated to the client. Because the port mapping table relates complete connection information—source and destination address and port numbers—it is possible to validate any or all of this information before passing incoming packets back to the client. This checking helps to provide effective firewall protection against Internet-launched attacks on the private LAN.
Enterprise Storage Networks are large and complex environments that include various elements such as storage arrays, switches, hosts and databases all inter-networked. These elements occur in several hundreds in such environments. These elements in turn may consist of several hundred thousands of manageable elements such as storage devices, storage and switch ports, database instances, host devices and file systems, and the like. Management of such environments is a daunting task and typically requires Storage Resource Management solutions such as EMC's ControlCenter (ECC) family of products, available from EMC Corporation of Hopkinton, Mass. ECC includes agents that are deployed on storage elements for the purpose of gathering data about these elements, components that process and persist data, applications that use persisted information to enable the management of these environments. ECC identifies the entire process of retrieving the collected data from agents to data persistence as a transaction. The sizes of collected data range from a few kilobytes to several hundred megabytes. Processing these from as many agents as there are deployed can be complex, time-consuming and failure-prone. Executing these transactions reliably and efficiently are vital to the correct functioning of ECC. There are multiple distributed components that simultaneously participate in parallel processing of transactions. Transaction processing may fail for various reasons such as inconsistent data, malfunctioning components, network problems, and the like. Transaction processing failure prevents persisting data, which impedes storage management. Hence it is important to detect and identify the nature of these problems and to determine the possibility of recoverability from such problems. ECC employs a multi-tier state-based mechanism to track and manage each transaction through its lifecycle. Using this approach, transaction processing is achieved in a distributed and reliable manner. This enables improved supportability in ECC and automated recoverability of failed transactions where possible.