A network is a communications facility that permits a number of workstations, computers or other equipment (hereinafter collectively "computer(s)") to communicate with each other. Portions of a network involve hardware and software, for example, the computers or stations (which individually may comprise one or more central processing units, random access and persistent memory), the interface components, the cable or fiber optics used to connect them, as well as software that governs the access to and flow of information over the network. In networks in which data flow is 100 Mbits/sec. ("Mbps") or higher, the transmission medium is often fiber optics. In networks in which a slower data rate is acceptable, e.g., 10 Mbps, the transmission medium may be coaxial cable or, as is often the case for an Ethernet network, twisted wires.
In a network, network architecture defines protocols, message formats and other standards to which the computers and other equipment, and software must adhere. Most network architectures have adopted a model comprising functional layers in which a given layer is responsible for performing a specific set of functions, and for providing a specific set of services. Thus, the services provided by each layer and the interlayer interfaces can define a network architecture. Protocols define the services covered across a layer interface and the rules followed in the processing performed as a part of that service.
Several organizations have proposed models and standards that have been accepted within the networking community. The International Standards Organization ("ISO"), for example, has proposed a seven layer reference model for computer networking that is called the open systems interconnect ("OSI") architecture. Another set of standards has been promulgated by the Institute of Electrical and Electronics Engineers ("IEEE") set of proposed local area network ("LAN") standards known as IEEE Project 802. This model conforms to the seven-layer OSI model, but directly solely to the lowest two OSI layers, namely the physical layer and the data link layer.
FIG. 1A depicts a network according to the IEEE Project 802 modification to the ISO seven layer model, in which several computers 10, 10', 10" can communicate data to each other over a physical link medium 20, e.g., cable, typically via a data terminating equipment unit ("DTE") 120, 120', 120". Any or all of the DTEs may be a switch, a router, another computer system, etc. In practice, a network may include hundreds or thousands of computers. In FIG. 1A, it is understood that associated with computer 10" (or equivalent) is a similar seven layer ISO model.
The bottommost, first or "L1" layer 30 in both the ISO and Project 802 model is a physical layer that is concerned with connections between two machines (e.g., computers 10, 10') to allow transmission of bit streams over a physical transmission medium (e.g., cable 20). Thus, physical layer 30 is concerned with types of cabling, cable plugs, port connectors, and the like. Often a server will have up to fifty physical port connectors, although more modern units such as the Sun Microsystems, Inc. Enterprise model 10,000 server has accept 100 physical links, each having 100 Mbit/sec. flowrate.
Many Ethernet networks adhere to a carrier sense multiple access with collision detection ("CSMA/CD") standard. In the 802 model for CSMA/CD, a reconciliation interface 40 defined by a Media Independent Interface ("MII") standard exists for the reconciliation sublayer 40 interface between physical layer 30 and a media access control ("MAC") sublayer 50B.
Under MII, data and delimiters are synchronous to the corresponding clock, and two asynchronous media status signals are provided, namely carrier sense ("CRS"), and collision ("COL"). MII provides a two wire serial management interface for control and status gathering, namely management data clock ("MDC"), and management data input/output ("MDIO"). In the OSI seven-layer model, the layer above the physical layer is a data link layer that is responsible for error-free transmission of data frames between network nodes. A data link control protocol describes operation and interfaces of this layer, which must also shield higher layers in the model from concerns about the physical transmission medium.
In the 802 model shown in FIG. 1A, the data link layer is subdivided into MAC layer 50B and an overlying logical link control ("LLC") layer 50A, collectively a second or L2 layer. The media access control sublayer is concerned with access control methods to determine how to control the use of the physical transmission medium. The LLC sublayer 50A is responsible for medium-independent data link functions and allows the network (or internet protocol, "IP") layer 60 above to access LAN services independently of how the network is implemented. The network or IP layer 60 is often referred to as layer 3, or L3. According to the 802 architecture, LLC sublayer 50A provides services to network 60 in the same fashion as would a conventional data link protocol in a wide area network.
The MAC sublayer 50B provides services to the overlying LLC sublayer 50A, and manages sharing of the transmission medium among the different stations on the network. A media access management function receives a frame from the data encapsulation function after the necessary control information has been added. Thereafter, media access management is responsible for ensuring physical transmission of the data. The data frame in an Ethernet full-duplex environment has a maximum size of 1,518 bytes.
Several 802 standards exist for MAC sublayer 50B, including the so-called carrier sense multiple access with collision detection ("CSMA/CD") standard, and the 802.3 MAC standard provides flow control mechanisms in a half-duplex environment. In such environments, CSMA/CD defines data encapsulation/decapsulation and media access management functions performed by MAC sublayer 50B itself, the data encoding/decoding function being performed by underlying physical layer 30. However, as described later herein, the present invention assumes full duplex operation at the physical and media access control layers 30, 50B for each segment of a trunk.
Physical transmission of the data may be ensured using carrier sensing to defer transmission until the network is clear. In brief, a transmitting station (e.g., computer or user 10) listens or monitors the transmission medium (e.g., cable 20) before transmitting to determine whether another station (e.g., computer or user 10') is currently transmitting a message, e.g., to learn whether the medium is free. Using the services of the L1 physical layer 30, the media access management determines whether the transmission medium (or carrier) is presently being used. If the medium is not being used, media access management passes the data frame to L1 physical layer 30 for transmission. Even after frame transmission has begun, media access management continues to monitor the carrier. If the carrier is busy, media access management continues monitoring until no other stations are transmitting. Media access management then waits a specified random time to allow the network to clear and thereafter begins transmission.
But other station(s) having messages to send may all listen simultaneously, discern that the transmission medium appears quiet, and begin to transmit messages simultaneously. The result is a collision and garbled messages. If signal collision is detected, receiving stations ignore the garbled transmission, transmitting stations stop transmitting messages immediately and transmit a jamming signal over the medium. Following collision, each transmitting station will attempt to re-transmit after waiting for a random backoff-delay time period for the carrier to clear. Thus, a station transmitting must listen sufficiently long to ensure that collision has not occurred.
In FIG. 1A, network or IP layer 60 concerns the routing of data from one network node to another. It is the role of network layer 60 to route data between network nodes. Transport layer 70 provides data transfer between two stations at an agreed upon level of quality once a connection is established between the stations. Transport layer 70 selects the particular class of service to be used, monitors transmission to ensure maintained service quality, and advises the stations (or users) if quality cannot be maintained.
Session layer 80 provides services that organize and synchronize a dialogue occurring between stations, and manages data exchange between stations. As such, session layer 80 controls when stations can send and receive data, based upon whether they can send and receive concurrently or alternately.
Presentation layer 90 ensures that information is presented to network users meaningfully, and may provide character code translation services, data conversion, data compression and expansion services.
Application layer 100 provides a mechanism for application processes to access system interconnection facilities for information exchange. The application layer provides services used to establish and terminate inter-user connections, and to monitor and manage the interconnected systems and the resources they employ.
Although the network shown in FIG. 1A may be half-duplex (or shared), or full-duplex, with respect to the present invention full-duplex only will be assumed. In full-duplex, there are transmit and a receive communications paths, and one or more stations may transmit and receive simultaneously. The dual communications channel or path may in fact be multiple wires or cables, or a single wire or cable that simultaneously carries transmit and receive signals in both directions, perhaps using frequency division. Full-duplex networks can provide a higher data rate than half-duplex networks, often 100 Mbps, and are often preferred because of the more rapid communication rate. Historically, network communication link speeds have increased: 10BASE-T links have given way to 100BASE-T links, which are now being scaled to transmission rates of up to 1 gigabit/second (1000 Mbps) with 1000BASE-T links. But in practice the choice of link speeds (e.g., 10 Mbps, 100 Mbps, 1000 Mbps) may not well match the amount of sustained data throughput that a particular network device can support. For example, modern multi-processor servers can sustain greater than 100 Mbps aggregate network transfer rates. Further, when switches and high performance routers are used to interconnect multiple links of a given transmission speed, it is necessary and desired that the inter-switch or inter-router link be able to support at least some aggregation of the links. Enhancing a network link speed from say 100 Mbps to 1000 Mbps is hardly cost effective unless utilization of the higher speed link is enhanced substantially, e.g., perhaps 40% to 50%, or more. Increasing network link speed also requires new hardware.
In FIG. 1A, cable media 20 is an Ethernet-compatible cable and will include a plurality of separate wires. As noted, a host, e.g., computer 10, typically will have a plurality of output port connectors (fifty, perhaps), that each accommodate a separate physical cable whose other end will connect to a switch 120 or other DTE. Collectively, the separate wires connecting host 10 to DTE 120 will define a single link of Ethernet-compatible cable, even though multiple wires are present within the cable.
FIG. 1B depicts server 10 with a plurality of output connector ports, here denoted by their network ID numbers, nid0, nid1, nid2, . . . nid4. of course there may be fifty or more of such ports and associated network Ids. The link associated with nid0 is shown physically connected with other equipment 95, the precise equipment and terminating destination being unimportant. A plurality of physical links within cable 20 are shown as terminating at a DTE 120, here a switch. Other ports on switch 120 are shown as having physical links connecting to, for example, yet another system 105, as well as to a client 10', and another DTE 120'.
The operating system associated with server 10 in FIG. 1B must treat each of the server connector ports (or the physical cable links connected thereto) on an individual basis. Assume for example that equipment 95 is to be physically relocated from the port position associated with nid0 to some other output port on server 10. To accomplish this, the system administrator must assign a new IP-level address for relocated equipment 95, because cable link A will no longer be used, and a new cable link will be used instead, terminating at a different output port on server 10. Similarly, if cable link B coupling the server output port associated with nid1 to a first input port of switch 120 were down, e.g., broken or defective, no data would be presented to that input port of switch 120 until cable B could be repaired or replaced. There is no automatic re-direction of data to preserve system throughput in the prior art configuration shown.
In FIG. 1B, for server 10 to transmit data to client 10', the system administrator would have first had to configure the system such that client 10' is on one of the plurality of sublinks associated with switch 120. In transmitting packet information, server 10 would specify an IP-level sub-link address (associated with IP layer 60) corresponding to one of the links to which client 10' was assigned. As is common, the IP level sub-link address would have two ports: a network ID and a host ID. However each IP-level sub-link address has a different network IP level address associated with it (see also FIG. 4B). Understandably, if too many clients seek to use the same IP address, then excess data traffic would attempt to flow through a single link, potentially congesting the network system.
The network administrator will have used a common network level ID for the IP-level address associated with one of the group of links, to which client 10' is dedicated, or bound. One undesired result is that because client 10' DTE and its physical link are bound, the system administrator cannot readily or flexibly group together any arbitrary number of physical links. The network or IP-level three addresses were denoted nid0, nid1, . . . nidN (see also FIG. 4B). In the prior art, link grouping is such that sub-links may be denoted at level two with addresses nid1.1, nid1.2, nid1.3, and so forth. To communicate with a piece of equipment, e.g., client 10', that is associated with a group of sub-links including nidi, it would also be necessary to specify which link is involved, e.g., linkage 3, denoted nid1.3.
Thus, software wishing to send information from server 10 to client 10' in FIG. 1B must specify the unique destination address as nid1.3. This is because the software application must communicate with level two via level three (see FIG. 1A and FIG. 4B). In essence, level three is software-commanded to send the data packet to address nid1.3.
Layer three information must be communicated to layer two so as to be understandable to layer two, which understands only its own addressing protocol. Thus, layer three will include a server-maintained mapping function that translates nid1.3 into a physical (e.g., MAC/LLC) address that is meaningful at the network level two layer. In practice, the server causes its layer three to maintain a mapping table (using address protocol or ARP information) for each network level ID that is a local link corresponding to an Ethernet ID. In essence, a vertical link level three-to-MAC address level two mapping results for the network ID in question. A separately maintained protocol table will contain an entry for nid1.3 (among other entries) as well as its associated MAC/LLC address. The map-reported particular MAC/LLC address will be and indeed must be used to send the packet in question to client 10'. Unfortunately in a switched network such as has been described, the packets in question have no alternative route to client 10' except this particular MAC/LLC address. For this reason, the addressing system is described as being static.
FIGS. 2A-2C depict various network interconnect configurations. In FIG. 2A, three separate wire link segments A, B, and C are included within media 20, and are used to couple a DTE, here server 10, and a switch 120 (or other mechanism). Of course more or fewer than three wire link segments could be used, depending upon the amount of bandwidth required for the system. Switch 120 is also coupled to several client segments, denoted 1, 2, 3, 4 and 5. In this configuration, server 10 may be replaced with other types of equipment, e.g., a router, a high performance workstation, a printer, etc.
In the embodiment of FIG. 2B, three wire link segments A, B, and C connect switch 120' to switch 120, although again a different number of segments could instead be used. In FIG. 2C, three segments (A, B, C) couple serves 10' to server 10, although one or both servers could be replaced with other equipment, a router, for example.
In the prior art systems, accommodating configurations of FIGS. 2A or 2B or 2C typical requires layer 60 to assign to 10 each of the three wires A, B, C, within link 20 a separate IP address. (If multiple clients were to use the same IP address, all of their data traffic would pass on a single link, congesting the system.) Thus, administratively, three internet addresses would be required to handle the three (exemplary) wires within link 20. Network software would then treat each of these links as three different links. In practice, there may be fifty or more separate wires (e.g., A, B, C, . . . ) within a physical link 20, and administratively having to assign and handle separate internet addresses for 20 each is burdensome.
For example, in a system having multiple Ethernet links from a server/host system that are connected to an Ethernet-compatible network using an Ethernet switch, the system administrator must statically assign a different ID address (e.g., assign different IP host names) to each link, and also must configure each link with a different MAC address. In addition to being time consuming, the network administrator must also group other host/client systems into different IP sub-nets so that the multiple links can share network data traffic to the server. This grouping of clients into an associated logic group is required because certain clients would have to use one IP address to access the server since one cable is used.
Unfortunately, if a client or DTE had to be relocated within the system, it would now use a new cable coming from the server, and its relocation would require the system administrator to assign a new IP address. The various clients (or other DTE) are distributed at the receiving end of the system, and if the IP address reassignment were not carefully done, too many clients might end up using the same IP address. This could readily result in excessive traffic attempting to pass through a single physical link, congesting the system and degrading system bandwidth. Further, if a physical link were damaged or disconnected, a client whose dedicated IP address involved that link would lose access to the server, there being no other access path. Data intended to go to or from such a client would be lost.
Thus, in prior art network configuration protocols, for each physical link the administrator must statically assign a separate level 3 IP layer address and a level two LLC/MAC layer 50A/B address. Because DTE units are statically associated with a link in a group of links, e.g., in a sub-link, an IP layer sub-link address is required.
To summarize, in an Ethernet-compatible network there is a need for a flexible and dynamic method to allow multiple physical links to be logically grouped into a single logical or virtual link. In such a system, the system administrator should be able to assign a single LLC/MAC layer 2 address, and a single IP layer 3 address to the resultant virtual link. Such a system should permit grouping together into a single trunk any arbitrary number of physical links. The resultant system should provide flexibility to the system administrator in terms of rapidly reconfiguring the network system.
Such reconfiguration ability should improve load balancing and the maintenance of speed network system throughput, and should permit flexible reassignment of data flow traffic to a different physical link in the event there is disruption or failure of data flow through a used physical link. Further, such a system should logically preserve the flow sequence of data packets carried by the network between two communication points. Further, when a previously failed physical link again becomes available, the system should then use such link to help share data flow load over the network trunk.
The present invention provides such a methodology for use in an Ethernet-compatible network system.