1. Field of the Invention
This invention generally relates to communication networks and, more particularly, to a system and method for accurately determining logical distances between network-connected passive host devices.
2. Description of the Related Art
In computer networking, the Link Layer (Layer 2 or L2) is a group of methods or protocols that only operate on a host's link. The link is the physical and logical network components used to interconnect hosts or nodes in the network, and a link protocol is a suite of methods and standards that operate only between adjacent network nodes.
Every host is a physical network node (i.e. a network device), but every physical network node is not necessarily a host. Network nodes such as modems and network switches are not assigned host addresses, and are not considered as hosts. As used herein, a passive host is a system connected to the network operating unaware of its proximity with its peers on the same Layer two network, and a controller or active host is a system connected to the network aware of its proximity with its peers located on the same L2 network. That is, a controller host is a computer system with related components capable of manipulating proximity measurements, calculations, or data. A system, as defined herein, is a device capable of executing software instructions to process, transport, and/or store data.
Despite the different semantics of layering in TCP/IP and OSI, the Link Layer is often described as a combination of the Data Link Layer (Layer 2) and the Physical Layer (Layer 1) in the Open Systems Interconnection (OSI) protocol stack. However, TCP/IP's layers are descriptions of operating scopes (application, host-to-host, network, and link) and not detailed prescriptions of operating procedures, data semantics, or networking technologies.
RFC 1122 exemplifies that local area network protocols such as Ethernet and IEEE 802, and framing protocols such as Point-to-Point Protocol (PPP) belong to the Link Layer. Some examples of Link Layer protocols include: Address Resolution Protocol (ARP), Reverse ARP (RARP), Neighbor Discovery Protocol (NDP), Open Shortest Path First (OSPF), Tunnels (L2TP), PPP, and Media Access Control (Ethernet, MPLS, DSL, ISDN, FDDI).
LAN standards such as Ethernet and IEEE 802 specifications use terminology from the seven-layer OSI model rather than the TCP/IP reference model. The TCP/IP model in general does not consider physical specifications, rather it assume a working network infrastructure that can deliver media level frames on the link. Therefore, RFC 1122 and RFC 1123, the definition of the TCP/IP model, do not discuss hardware issues and physical data transmission and set no standards for those aspects, other than broadly including them as Link Layer components. It is assumed herein that physical data transmission aspects are part of the Link Layer, and that a hardware layer or physical layer exists below the link layer.
The link layer includes logical link-local networking methods such as the encapsulation of IP packets into frames, frame synchronization, packet error detection, node flow control, Media access control (MAC) sublayer collision avoidance, Physical addressing (MAC addressing), and LAN switching (packet switching) including MAC filtering and spanning tree protocol, to name a few.
The Link Layer Discovery Protocol (LLDP) is a vendor-neutral Layer 2 protocol used by network devices for advertising of their identity and capabilities on the local network. The protocol was formally ratified as IEEE standard 802.1AB-2005 in May 2005. Information gathered with LLDP is stored in the device and can be queried with the Simple Network Management Protocol (SNMP). The topology of an LLDP-enabled network can be discovered by “crawling” the hosts and querying this database. Information that may be retrieved includes: system name and description, port name and description, IP management address, system capabilities (switching, routing, etc.), and MAC/PHY information.
The Spanning Tree Protocol (STP) is a link layer network protocol that ensures a loop-free topology for any bridged LAN. Spanning tree allows a network design to include spare (redundant) links to provide automatic backup paths if an active link fails, without the danger of bridge loops, or the need for manual enabling/disabling of these backup links. Bridge loops must be avoided because they result in flooding the network. STP is defined in the IEEE Standard 802.1D. As the name suggests, it creates a spanning tree within a mesh network of connected layer-2 bridges (typically Ethernet switches), and disables those links that are not part of the tree, leaving a single active path between any two network nodes.
FIG. 1 is a schematic diagram depicting an exemplary network (prior art). The collection of bridges in a LAN can be considered a graph whose nodes are the bridges, and whose edges are the cables connecting the bridges. To break loops in the LAN while maintaining access to all LAN segments, the bridges collectively compute a spanning tree. The spanning tree that the bridges compute using the Spanning Tree Protocol can be determined using rules.
For example, the numbered boxes represent bridges (the number represents the bridge ID), and the lettered clouds represent network segments. Assuming that bridge with the smallest bridge ID is the root bridge, the root bridge is 3. Assuming that the cost of traversing any network segment is 1, the least cost path from bridge 4 to the root bridge goes through network segment c. Therefore, the root port for bridge 4 is the one on network segment c. The least cost path to the root from network segment e goes through bridge 92. Therefore the designated port for network segment e is the port that connects bridge 92 to network segment e. Any active port that is not a root port or a designated port is a blocked port. Different technologies have different default costs for network segments. An administrator can configure the cost of traversing a particular network segment. By establishing a unique data path between hosts, spanning tree guaranties that the data transfer latency will remain constant over time. The effective latency is therefore the sum of the propagation time over the cabling infrastructure and the network traffic handling latency.
The above-described rules are one way of determining what spanning tree will be computed by the algorithm, but the rules as written imply knowledge of the entire network. The bridges have to determine the root bridge and compute the port roles (root, designated, or blocked) with only the information that they have. To ensure that each bridge has enough information, the bridges use special data frames called Bridge Protocol Data Units (BPDUs) to exchange information about bridge IDs and root path costs. A bridge sends a BPDU frame using the unique MAC address of the port itself as a source address, and a destination address of the STP multicast address 01:80:C2:00:00:00.
There are three types of BPDUs: Configuration BPDU (CBPDU), used for Spanning Tree computation, Topology Change Notification (TCN) BPDU, used to announce changes in the network topology, and Topology Change Notification Acknowledgment (TCA). BPDUs are exchanged regularly (every 2 seconds by default) and enable switches to keep track of network changes and to start and stop forwarding at ports as required.
When a device is first attached to a switch port, it will not immediately start to forward data. It will instead go through a number of states while it processes BPDUs and determines the topology of the network. When a host is attached such as a computer, printer, or server the port will always go into the forwarding state, albeit after a delay of about 30 seconds while it goes through the listening and learning states. The time spent in the listening and learning states is determined by a value known as the forward delay (default 15 seconds and set by the root bridge). However, if instead another switch is connected, the port may remain in blocking mode if it is determined that it would cause a loop in the network. Topology Change Notification (TCN) BPDUs are used to inform other switches of port changes. TCNs are injected into the network by a non-root switch and propagated to the root. Upon receipt of the TCN, the root switch will set a Topology Change flag in its normal BPDUs. This flag is propagated to all other switches to instruct them to rapidly age out their forwarding table entries. Rapid Spanning Tree Protocol (RSTP) provides for faster spanning tree convergence after a topology change (IEEE 802.1D-2004). While STP can take 30 to 50 seconds to respond to a topology change, RSTP is typically able to respond to changes within a second.
While there is currently a service on the Internet which provides a rough geo-localization function using IP Internet control Message Protocol (ICMP) services, there are no link layer algorithms that provide a means of determining host proximity, as measured by time or cable length.
For example, two customers are co-located on a storage filer: Customer A and Customer B. The storage system heads uses FCOE (Fiber Channel Over Ethernet) to communicate with the disk shelve. The overall unit is not collocated in the same racks because of typical power constraints. Instead, the shelves are spread between multiple MDFs (main distribution frames) within the data center. Conventionally, the service is assigned by the storage management device to the system requiring the service. In this case Customer A is physically located 50 and 200 ft from two data center centralized storage units, and Customer B is physically located 100 ft and 50 ft from the units.
Ethernet network packet routing is dynamic, as different events on the network can redirect the traffic to different path. For example, if Customer A is initially 50 ns from the filer, it may over time be rerouted to a different, less efficient path. This change may be caused by a less efficient cable routing, or a redundant protocol (such as spanning tree) convergence.
Practically, operational teams do not look at the exact logical location of the device and they typically assume that if devices are physical close, they must be logical close (connected via the shortest or fastest possible route). However as noted above, this assumption is not always correct. Conventionally, the logical distance between hosts can only be calculated by installing an agent on each host, which reports on network latencies and conditions. Such an agent would have to be compatible with the specific OS and hardware running on this system.
Conventional storage implementations rely on either an IP or Fiber channel for the connectivity between the hosts and shelves. Applications using the filer can be classified in two categories:                Performance: requiring storage volume to replicate the behavior of a local drive with low latency and high throughput where Fiber channel is widely used.        Access: typically implemented over IP such as iSCSI and NFS.        
With the introduction of SSD technology, hard drive access time has been reduce from 4-8 millisecond (ms), to less than 1 ms. Going forward, the access time is likely to eventually drop to the 10-20 nanosecond (ns) range. Network latency in a LAN is caused (in an undersubscribed network) by the propagation of light and general switching/routing speed. Light travels roughly at 1 foot per nanosecond through cable. With data center cabling averaging 200 feet: the network-induced latency will become a major part of the total access time latency. A number of companies have also implemented L2MPLS between facilities: in this condition the L2 Ethernet environment within a sub-network is expended geographically with latency similar to the one seen on WAN networks (10 ms to 350 ms and more).
It will, therefore, be advantageous to put in place a mechanism to optimize service allocation based on the quality of service requested and the proximity of the storage system to the requestor.
It would be advantageous if there was a method to accurately determine the network distance (in time) and latency between networks passive hosts in a Layer two Ethernet environment.