§ 1.1 Field of the Invention
The present disclosure concerns communications networks. More specifically, the present disclosure concerns multihomed access to a transport network in the context of a virtual private network (VPN), such as an Ethernet VPN (EVPN) for example.
§ 1.2 Background Information
The description of art in this section is not, and should not be interpreted to be, an admission that such art is prior art to the present invention.
A computer network is a collection of interconnected computing devices that can exchange data and share resources. Example network devices include layer two devices that operate within the second layer (i.e., L2, or the data link layer) of the Open Systems Interconnection (OSI) reference model, and layer three devices that operate within the third layer (i.e., L3, or the network layer) of the OSI reference model. Network devices (such as routers, switches, etc., generally referred to as “nodes”) within computer networks are interconnected through one or more communications links, thereby defining a network topology. Such network nodes often include at least one control unit that provides so-called “control plane” functionality and at least one forwarding unit for routing and/or switching data units, such as packets for example.
§ 1.2.1 Known Private Networking Technologies
For many entities (such as small businesses, universities, etc.), local area networks (or “LANs”) suffice for intra-entity communications. Indeed, LANs are quite popular since they are relatively inexpensive to deploy, operate, and manage, and are based on mature, well-developed technology (e.g., Ethernet). Unfortunately, however, most entities need to communicate (e.g., video, voice, and/or data) with their own facilities, or others, beyond their immediate location. Thus, wide area networks (or “WANs”) are needed. Very often, entities want at least some privacy or security attached to their communications.
Presently, private long-haul communications can take place over networks that can be generally classified into two types—(1) dedicated WANs that facilitate communications among multiple sites, and (2) public transport networks that allow one or more sites of a private network to communicate. Both of these types of networks are introduced below.
§ 1.2.1.1 Dedicated WANs
Dedicated wide area networks (“WANs”) are typically implemented using leased lines or dedicated circuits to connect multiple sites. Customer premise routers or switches at these sites connect these leased lines or dedicated circuits together to facilitate connectivity between each site of the network. Most private networks with a relatively large number of sites will not have “fully meshed” network topologies (i.e., direct connections between each of the sites) due to the cost of leased lines or dedicated circuits and due to the complexity of configuring and managing customer premises equipment. Rather, some form of hierarchical network topology is typically employed in such instances. Unfortunately, dedicated WANs are relatively expensive and typically require the customer to have some networking expertise.
§ 1.2.1.2 Virtual Private Networks (VPNs)
Public transport networks are often used to allow remote users to connect to an enterprise network using some type of transport network technology. (Note that the word “public” in the phrase “public transport network” relays the fact that more than one entity may use it, even though it may be privately owned and managed, and not available to the general public.) Given the expense of WANs, as well as the expertise needed to manage them, virtual private networks (VPNs) using public transport networks have become increasingly popular. Multi-Protocol Label Switching (MPLS) technology is often used in public transport networks.
Ethernet VPNs (EVPNs), such as Border Gateway Protocol (BGP) Multi-Protocol Label Switching (MPLS)-based EVPNs, are now introduced.
§ 1.2.1.2.1 EVPNs (RFC 7209)
Virtual Private LAN Service (VPLS) (e.g., as defined in Request for Comments (RFC) 4664, RFC 4761 and RFC 4762 from the Internet Engineering Task Force (IETF), each of which is incorporated herein by reference) is a proven and widely deployed technology. Unfortunately, VPLS has some limitations with respect to multihoming (i.e., where a customer premise edge device (CE) is connected with more than one service provider edge device (PE) of a transport network, so that a backup/standby link can be used if a primary link fails). RFC 7209 (incorporated herein by reference) specifies requirements for an EVPN to address various issues considered by some to be inadequately addressed by VPLS.
An EVPN may be used to extend two or more remote layer two (L2) customer networks through an intermediate layer three (L3) network (usually referred to as a “service provider transport network,” or simply a “transport network”) as if the intermediate L3 network does not exist from the perspective of the customer(s) (i.e., in a “transparent” manner). In particular, the EVPN transports L2 communications, such as Ethernet packets or “frames,” between customer networks via the transport network. For example, L2 communications may be transported over traffic engineered label switched paths (LSPs) through the transport network (e.g., in accordance with MPLS). In a typical configuration, service provider edge devices (PEs) coupled to the customer edge network devices (CEs) of the customer networks define LSPs within the transport network to carry encapsulated L2 communications as if these customer networks were directly attached to the same local area network (LAN). In some configurations, the PEs may also be connected by an IP infrastructure in which case IP/GRE tunneling or other IP tunneling can be used between the network devices.
In an EVPN, L2 address learning (also referred to as “MAC learning”) in a PE device may occur in the control plane, using a routing protocol, rather than in the data plane (as happens with traditional bridging). For example, as described in § 1.2.1.2.2 below, in EVPNs, a PE may use the Border Gateway Protocol (BGP) (which is an L3 routing protocol) to advertise to other PEs media access control (MAC) address(es) learned from the local CEs to which the PE is connected. Specifically, a PE may use BGP route advertisement messages to announce reachability information for the EVPN. These BGP route advertisements may specify one or more MAC addresses learned by the PE device (instead of L3 routing information that is traditionally advertised in BGP route advertisements).
§ 1.2.1.2.2 BGP MPLS-Based EVPNs (RFC 7432)
RFC 7432 (incorporated herein by reference) describes BGP MPLS-based EVPNs. An EVPN “instance” comprises CEs that are connected to PEs that form the edge of the (e.g., MPLS) transport network. As noted above, a CE may be a host, a router, or a switch. As also noted above, the PEs provide virtual Layer 2 bridged connectivity between the CEs. There may be multiple EVPN instances in the service provider transport network.
As further noted above, the PEs may be connected by an MPLS LSP infrastructure, which provides the benefits of MPLS technology, such as fast reroute, resiliency, etc. The PEs may also be connected by an IP infrastructure, in which case IP/GRE (Generic Routing Encapsulation) tunneling or other IP tunneling can be used between the PEs. RFC 7432 concerns procedures only for MPLS LSPs as the tunneling technology. However, such procedures are designed to be extensible to IP tunneling as the Packet Switched Network (PSN) tunneling technology.
As already noted above, in an EVPN, MAC learning between PEs occurs not in the data plane (as happens with traditional bridging in VPLS) but in the control plane. Control-plane learning offers greater control over the MAC learning process, such as restricting who learns what, and the ability to apply policies. Furthermore, the control plane chosen for advertising MAC reachability information is multi-protocol (MP) BGP (similar to IP VPNs described in RFC 4364). This provides flexibility and the ability to preserve the “virtualization” or isolation of groups of interacting agents (hosts, servers, virtual machines) from each other. In EVPN, PEs advertise the MAC addresses learned from the CEs that are connected to them, along with an MPLS label, to other PEs in the control plane using Multiprotocol BGP (MP-BGP). Control-plane learning enables load balancing of traffic to and from CEs that are multihomed to multiple PEs. This is in addition to load balancing across the MPLS core via multiple LSPs between the same pair of PEs. In other words, it allows CEs to connect to the transport network via multiple active points of attachment. It also improves convergence times in the event of certain network failures.
However, learning between PEs and CEs may be done by the method best suited to the CE (such as data-plane learning, IEEE 802.1x, the Link Layer Discovery Protocol (LLDP), IEEE 802.1aq, Address Resolution Protocol (ARP), management plane, or other protocols).
It is a local decision as to whether the Layer 2 forwarding table on a PE is populated with all the MAC destination addresses known to the control plane, or whether the PE implements a cache-based scheme. For instance, the MAC forwarding table might be populated only with the MAC destinations of the active flows transiting a specific PE.
The policy attributes of EVPN are very similar to those of IP-VPN. An EVPN “instance” may have a Route Distinguisher (RD) that is unique per MAC-VRF and one or more globally unique Route Targets (RTs). A CE may attach to a MAC-VRF on a PE, on an Ethernet interface that may be configured for one or more Ethernet tags, e.g., VLAN IDs. Some deployment scenarios guarantee uniqueness of VLAN IDs across EVPN instances: all points of attachment for a given EVPN instance use the same VLAN ID, and no other EVPN instance uses this VLAN ID (referred to as a “Unique VLAN EVPN”).
In network communications systems, protocols are used by devices, such as routers for example, to exchange network information. Routers generally calculate routes (also referred to as “paths”) used to forward data packets towards a destination. BGP allows routers (e.g., in different autonomous systems (“ASes”)) to exchange reachability information. BGP is summarized below.
The following refers to the version of BGP described in RFC 4271 (incorporated herein by reference). The primary function of a BGP speaking system is to exchange network reachability information with other BGP systems. This network reachability information includes information on the list of Autonomous Systems (ASes) that reachability information traverses. This information is sufficient for constructing a graph of AS connectivity, from which routing loops may be pruned, and, at the AS level, some policy decisions may be enforced.
BGP uses the transmission control protocol (“TCP”) as its transport protocol. When a TCP connection is formed between two systems, they exchange messages to open and confirm the connection parameters. The initial data flow is the portion of the BGP routing table that is allowed by the export policy, called the “Adj-Ribs-Out.”
Incremental updates are sent as the routing tables change. BGP does not require a periodic refresh of the routing table.
In RFC 4271, a “route” is defined as a unit of information that pairs an address prefix with the set of path attributes. The address prefix can be carried in the Network Layer Reachability Information (“NLRI”) field of an UPDATE message or MP_REACH_NLRI attribute, and the set of path attributes is reported in the path attributes field of the same UPDATE message.
Routes are advertised between BGP speakers in UPDATE messages. Multiple destinations that share the same set of path attributes can be advertised in a single UPDATE message by including multiple prefixes in the NLRI field of the UPDATE message, or in the MP_REACH_NLRI path attribute of the UPDATE message.
BGP provides mechanisms by which a BGP speaker can inform its peers that a previously advertised route is no longer available for use. There are three methods by which a given BGP speaker can indicate that a route has been withdrawn from service. First, the IP prefix that expresses the destination for a previously advertised route can be advertised in the WITHDRAWN ROUTES field in the UPDATE message, or reported in the MP_UNREACH path attribute, thus marking the associated route as being no longer available for use. Second, a replacement route with the same NLRI can be advertised. Third, the BGP speaker connection can be closed, which implicitly removes all routes the pair of speakers had advertised to each other from service. Changing the attribute(s) of a route may be accomplished by advertising a replacement route. The replacement route carries new (changed) attributes and has the same address prefix as the original route.
In BGP, UPDATE messages are used to transfer routing information between BGP peers. The information in the UPDATE message can be used to construct a graph that describes the relationships of the various Autonomous Systems. More specifically, an UPDATE message is used to advertise feasible routes that share a common set of path attribute value(s) to a peer (or to withdraw multiple unfeasible routes from service). An UPDATE message may simultaneously advertise a feasible route and withdraw multiple unfeasible routes from service.
The UPDATE message 100 includes a fixed-size BGP header, and also includes the other fields, as shown in FIG. 1A. (Note some of the shown fields may not be present in every UPDATE message). Referring to FIG. 1A, the “Withdrawn Routes Length” field 110 is a 2-octets unsigned integer that indicates the total length of the Withdrawn Routes field 120 in octets. Its value allows the length of the Network Layer Reachability Information field 150 to be determined, as specified below. A value of 0 indicates that no routes are being withdrawn from service, and that the WITHDRAWN ROUTES field 120 is not present in this UPDATE message 100.
The “Withdrawn Routes” field 120 is a variable-length field that contains a list of IP address prefixes for the routes that are being withdrawn from service. Each IP address prefix is encoded as a 2-tuple 120′ of the form <length, prefix>. The “Length” field 122 indicates the length in bits of the IP address prefix. A length of zero indicates a prefix that matches all IP addresses (with prefix, itself, of zero octets). The “Prefix” field 124 contains an IP address prefix, followed by the minimum number of trailing bits needed to make the end of the field fall on an octet boundary. Note that the value of trailing bits is irrelevant.
Still referring to FIG. 1A, the “Total Path Attribute Length” field 130 is a 2-octet unsigned integer that indicates the total length of the Path Attributes field 140 in octets. Its value allows the length of the Network Layer Reachability Information field 150 to be determined. A value of 0 indicates that neither the Network Layer Reachability Information field 150, nor the Path Attribute field 140, is present in this UPDATE message.
The “Path Attributes” field 140 is a variable-length sequence of path attributes that is present in every UPDATE message, except for an UPDATE message that carries only the withdrawn routes. Each path attribute is a triple <attribute type, attribute length, attribute value> of variable length. The “Attribute Type” is a two-octet field that consists of the Attribute Flags octet, followed by the Attribute Type Code octet.
Finally, the “Network Layer Reachability Information” field 150 is a variable length field that contains a list of Internet Protocol (“IP”) address prefixes. The length, in octets, of the Network Layer Reachability Information is not encoded explicitly, but can be calculated as: UPDATE message Length—23—Total Path Attributes Length (Recall field 130.)—Withdrawn Routes Length (Recall field 110.), where UPDATE message Length is the value encoded in the fixed-size BGP header, Total Path Attribute Length, and Withdrawn Routes Length are the values encoded in the variable part of the UPDATE message, and 23 is a combined length of the fixed-size BGP header, the Total Path Attribute Length field, and the Withdrawn Routes Length field.
Reachability information is encoded as one or more 2-tuples of the form <length, prefix> 150′, whose fields are shown in FIG. 1A and described here. The “Length” field 152 indicates the length in bits of the IP address prefix. A length of zero indicates a prefix that matches all IP addresses (with prefix, itself, of zero octets). The “Prefix” field 154 contains an IP address prefix, followed by enough trailing bits to make the end of the field fall on an octet boundary. Note that the value of the trailing bits is irrelevant.
Referring to FIG. 1B, RFC 4760 (incorporated herein by reference) describes a way to use the path attribute(s) field 140 of a BGP update message 100 to carry routing information for multiple Network Layer protocols (such as, for example, IPv6, IPX, L3VPN, etc.) More specifically, RFC 4760 defines two new path attributes—(1) Mulitprotocol Reachable NLRI (“MP_Reach_NLRI”) and (2) Multiprotocol Unreachable NLRI (“MP_Unreach_NLRI”). The first is used to carry the set of reachable destinations together with next hop information to be used for forwarding to these destinations, while the second is used to carry a set of unreachable destinations. Only MP_Reach_NLRI is discussed below.
Referring to FIG. 1B, the MP_Reach_NLRI “path attribute” 140′ includes an address family identifier (“AFI”) (2 octet) field 141, a subsequent address family identifier (“SAFI”) (1 octet) field 142, a length of Next Hop Network Address (1 octet) field 143, a Network Address of Next Hop (variable) field 144, a Reserved (1 octet) field 145 and a Network Layer Reachability Information (variable) field 146. The AFI and SAFI fields 141 and 142, in combination, identify (1) a set of Network Layer protocols to which the address carried in the Next Hop field 144 must belong, (2) the way in which the address of the Next Hop is encoded, and (3) the semantics of the NLRI field 146. The Network Address of Next Hop field 144 contains the Network Address of the next router on the path to the destination system. The NLRI field 146 lists NLRI for feasible routes that are being advertised in the path attribute 140′. That is, the next hop information carried in the MP_Reach_NLRI 140′ path attribute defines the Network Layer address of the router that should be used as the next hope to the destination(s) listed in the MP_NLRI attribute in the BGP Update message.
Referring back to FIG. 1A, an UPDATE message 100 can advertise, at most, one set of path attributes (Recall field 140.), but multiple destinations, provided that the destinations share the same set of attribute value(s). All path attributes contained in a given UPDATE message apply to all destinations carried in the NLRI field 150 of the UPDATE message.
As should be apparent from the description of fields 110 and 120 above, an UPDATE message 100 can list multiple routes that are to be withdrawn from service. Each such route is identified by its destination (expressed as an IP prefix), which unambiguously identifies the route in the context of the BGP speaker—BGP speaker connection to which it has been previously advertised.
An UPDATE message 100 might advertise only routes that are to be withdrawn from service, in which case the message 100 will not include path attributes 140 or Network Layer Reachability Information 150. Conversely, an UPDATE message 100 might advertise only a feasible route, in which case the WITHDRAWN ROUTES field 120 need not be present. An UPDATE message 100 should not include the same address prefix in the WITHDRAWN ROUTES field 120 and Network Layer Reachability Information field 150 or “NLRI” field in the MP_REACH_NLRI path attribute field 146.
§ 1.2.1.2.2.1 Ethernet Segments in a BGP MPLS-Based EVPN
Per RFC 7209, each Ethernet segment needs a unique identifier in an EVPN. This section defines how, under RFC 7432, such identifiers are assigned and how they are encoded for use in EVPN signaling.
When a customer site is connected to one or more PEs via a set of Ethernet links, this set of Ethernet links constitutes a so-called “Ethernet segment.” For a multihomed site, each Ethernet segment (ES) is identified by a unique non-zero identifier called an Ethernet Segment Identifier (ESI). Under RFC 7432, an ESI is encoded as a 10-octet integer in line format with the most significant octet sent first.
In general, an Ethernet segment should have a non-reserved ESI that is unique network wide (i.e., across all EVPN instances on all the PEs). If the CE(s) constituting an Ethernet segment is (are) managed by the network operator, then ESI uniqueness should be guaranteed. If, however, the CE(s) is (are) not managed, then the operator must configure a network-wide unique ESI for that Ethernet segment if auto-discovery of Ethernet segments and Designated Forwarder (DF) election is to be enabled.
As far as the CE is concerned, it would treat the multiple PEs that it is connected to as the same switch. This allows the CE to aggregate links that are attached to different PEs in the same bundle.
§ 1.2.1.2.2.2 Ethernet Tag IDs in a BGP MPLS-Based EVPN
An Ethernet Tag ID is a 32-bit field containing either a 12-bit or 24-bit identifier that identifies a particular broadcast domain (e.g., a VLAN) in an EVPN instance. The 12-bit identifier is called the VLAN ID (VID). An EVPN instance consists of one or more broadcast domains (one or more VLANs). VLANs are assigned to a given EVPN instance by the provider of the EVPN service. A given VLAN can itself be represented by multiple VIDs. In such cases, the PEs participating in that VLAN for a given EVPN instance are responsible for performing VLAN ID translation to/from locally attached CE devices. If a VLAN is represented by a single VID across all PE devices participating in that VLAN for that EVPN instance, then there is no need for VID translation at the PEs. Furthermore, some deployment scenarios guarantee uniqueness of VIDs across all EVPN instances; all points of attachment for a given EVPN instance use the same VID, and no other EVPN instances use that VID. This allows the route targets (RTs) for each EVPN instance to be derived automatically from the corresponding VID, as described in Section 7.10.1 of RFC 7432.
§ 1.2.1.2.2.3 BGP EVPN Routes in a BGP MPLS-Based EVPN
RFC 7432 defines a new BGP Network Layer Reachability Information (NLRI) called the EVPN NLRI. As shown in FIG. 2, the EVPN NLRI 150″ includes a Route Type (1 octet) field 210, a length (1 octet) field 220 and a Route Type specific (variable length) field 230. The Route Type field 210 defines the encoding of the rest of the EVPN NLRI (Route Type specific EVPN NLRI). The Length field 220 indicates the length in octets of the Route Type specific field 230 of the EVPN NLRI 150″. Although RFC 7432 describes (1) Ethernet Auto-Discovery (A-D) route, (2) MAC/IP Advertisement route, (3) Inclusive Multicast Ethernet Tag route and (4) Ethernet Segment route, route types, only Ethernet A-D routes are described here.
The EVPN NLRI may be carried in BGP (RFC 4271 using BGP Multiprotocol Extensions (RFC 4760, incorporated herein by reference), with an Address Family Identifier (AFI) of 25 (L2VPN) and a Subsequent Address Family Identifier (SAFI) of 70 (EVPN). The NLRI field in the MP_REACH_NLRI/MP_UNREACH_NLRI attribute contains the EVPN NLRI (encoded as specified above).
For two BGP speakers to exchange labeled EVPN NLRI, they must use BGP Capabilities Advertisements to ensure that they both are capable of properly processing such NLRI. This is done as specified in RFC 4760, by using capability code 1 (multiprotocol BGP) with an AFI of 25 (L2VPN) and a SAFI of 70 (EVPN).
Still referring to FIG. 2, an Ethernet A-D route type specific EVPN NLRI 230′ includes a Route Distinguisher (RD) (8 octets) field 232, an Ethernet Segment Identifier (10 octets) field 234, an Ethernet Tag ID (4 octets) field 236, and an MPLS Label (3 octets) field 238.
For BGP route key processing, only the Ethernet Segment Identifier 234 and the Ethernet Tag ID 236 are considered to be part of the prefix 154′ in the NLRI 150′. The MPLS Label field 238 is to be treated as a route attribute as opposed to being part of the route.
§ 1.2.1.2.2.4 Multihoming Functions in a BGP MPLS-Based EVPN
RFC 7432 describes the functions, procedures, and associated BGP routes used to support multihoming in EVPN, and covers both multihomed device (MHD) and multihomed network (MHN) scenarios. Section 8.4 of RFC 7432, which concerns Aliasing and Backup Path, is of particular relevance to the present disclosure.
FIG. 3 illustrates an example EVPN environment 300 in which embodiments consistent with the present description may operation. As shown in the example environment 300, an EVPN may be used to extend two or more remote layer two (L2) customer networks (sites A and B) 310a and 310b through an intermediate layer three (L3) network (usually referred to as a service provider transport network, or simply a transport network) 320. As already discussed earlier, the EVPN connects the two remote customer networks 310a and 310b in a so-called “transparent” manner (that is, as if the intermediate L3 network 320 does not exist from the perspective of the two remote customer networks 310a and 310b).
As noted above, if the service provider transport network 320 employs MPLS forwarding, the EVPN transports L2 communications, such as Ethernet packets or “frames,” between customer networks 310a and 310b via traffic engineered label switched paths (LSPs) through the transport network 320 in accordance with one or more MPLS protocols. In some configurations, the PEs 310a, 310b, 310c may also be connected by an IP infrastructure, in which case IP/GRE tunneling or other IP tunneling can be used between the PEs.
In the example environment 300, the customer network-site A 310a is “multihomed” to the transport network 320 via CEa 315a and PE1 330a and PE2 330b. Multihoming may be used to increase network reliability (e.g., by having multiple links between the customer network-site A 310a and the transport network 320), and/or for load balancing (e.g., by dividing packet flows such that they go over different links, thereby avoiding the concentration of too much network traffic on a single link).
In some multihomed implementations, only one of the local PEs (330a and 330b) is active, while the other(s) is in standby (also referred to as “single active” or “active-standby”). Such implementations are mainly used for network resiliency, but are not helpful for load balancing. For example, if customer device 312a1 (at customer network-site A 310a) is sending a flow of packets to customer device 312b1 (at customer network-site A 310b), and customer device 312aN is sending a flow of packets to customer device 312bR, both flows go through an active PE (e.g., PE1 330a). If the active PE fails (or if the link to (or an interface of the link to the active PE) fails), the standby PE (e.g., PE2 330b) becomes active.
In other multihomed implementations, all of the local PEs (330a and 330b) are active simultaneously (referred to as “all active,” or “active-active”). Such implementations are useful for load balancing network traffic. For example, if customer device 312a1 (at customer network-site A 310a) is sending a flow of packets to customer device 312b1 (at customer network-site B 310b), and customer device 312aN is sending a flow of packets to customer device 312bR, one flows might go through active PE1 330a and the other flow might go through active PE2 330b. There are many known schemes for load balancing (e.g., hashing packet header data (for example, a source/destination address pair) to a particular path).
In the case where a CE is multihomed to multiple PEs, e.g., using a Link Aggregation Group (LAG) (See, RFC 7424, incorporated herein by reference) with All-Active redundancy, it is possible that only a single PE learns a set of the MAC addresses associated with traffic transmitted by the CE. This leads to a situation in which one or more remote PEs receive MAC/IP Advertisement routes for these addresses from a single PE, even though multiple PEs are connected to the multihomed Ethernet segment. As a result, the remote PEs are not able to effectively load balance traffic among the PE nodes connected to the multihomed Ethernet segment. This could be the case, for example, when the PEs perform data-plane learning on the access, and the load-balancing function on the CE hashes traffic from a given source MAC address to a single PE.
Another scenario where this occurs is when the PEs rely on control-plane learning on the access (e.g., using ARP), since ARP traffic will be hashed to a single link in the LAG.
To address this issue, EVPN introduces the concept of “aliasing,” which is the ability of a PE to signal that it has reachability to an EVPN instance on a given ES even when it has learned no MAC addresses from that EVI/ES. The Ethernet A-D per EVI route is used for this purpose. A remote PE that receives a MAC/IP Advertisement route with a non-reserved ESI should consider the advertised MAC address to be reachable via all PEs that have advertised reachability to that MAC address's EVI/ES via the combination of an Ethernet A-D per EVI route for that EVI/ES (and Ethernet tag, if applicable) and Ethernet A-D per ES routes for that ES with the “Single-Active” bit in the flags of the ESI Label extended community set to 0. Note that the Ethernet A-D per EVI route may be received by a remote PE before it receives the set of Ethernet A-D per ES routes. Therefore, to handle corner cases and race conditions, the Ethernet A-D per EVI route is not to be used for traffic forwarding by a remote PE until it also receives the associated set of Ethernet A-D per ES routes.
The backup path is a closely related function, but it is used in Single-Active redundancy mode. In this case, a PE also advertises that it has reachability to a given EVI/ES using the same combination of Ethernet A-D per EVI route and Ethernet A-D per ES route as discussed above, but with the “Single-Active” bit in the flags of the ESI Label extended community set to 1. A remote PE that receives a MAC/IP Advertisement route with a non-reserved ESI should consider the advertised MAC address to be reachable via any PE that has advertised this combination of Ethernet A-D routes, and it should install a backup path for that MAC address.
An Ethernet A-D per EVPN instance (EVI) route (which is used for aliasing) may be constructed as follows. The Route Distinguisher (RD) is set per section 7.9 of RFC 7432. The Ethernet Segment Identifier is a 10-octet entity as described in section 5 of RFC 7432. The Ethernet A-D route is not needed when the Segment Identifier is set to 0. The Ethernet Tag ID is the identifier of an Ethernet tag on the Ethernet segment. This value may be a 12-bit VLAN ID, in which case the low-order 12 bits are set to the VLAN ID and the high-order 20 bits are set to 0. Alternatively, it may be another Ethernet tag used by the EVPN. It may be set to the default Ethernet tag on the Ethernet segment or to the value 0. Note that the above allows the Ethernet A-D route to be advertised with one of the following granularities:
One Ethernet A-D route per <ESI, Ethernet Tag ID> tuple per MAC-VRF. This is applicable when the PE uses MPLS-based disposition with VID translation or may be applicable when the PE uses MAC-based disposition with VID translation.
One Ethernet A-D route for each <ESI> per MAC-VRF (where the Ethernet Tag ID is set to 0). This is applicable when the PE uses MAC-based disposition or MPLS-based disposition without VID translation.
The usage of the MPLS label is described in section 14 of RFC 7432. The Next Hop field of the MP_REACH_NLRI attribute of the route is to be set to the IPv4 or IPv6 address of the advertising PE. The Ethernet A-D route is to carry one or more Route Target (RT) attributes, per section 7.10 of RFC 7432.
Referring once again to FIG. 3, consider the EVPN network 300 with three PEs—PE1 330a, PE2 330b, and PE3 330c. PE1 330a and PE2 330b are attached to the same multihomed customer network-site A 310a and serve as multihoming PEs for this customer network-site A 310a. PE3 330c is a remote PE from the PE1/PE2 perspective and is not attached to the same customer network-site A 310a, but rather, is attached to a second customer network-site B 310b of the same customer.
Suppose PE2 330b is initially offline. Therefore, PE1 330a initially learns all local MACs (e.g., of customer devices 312a1 . . . 312aN) for devices residing in the multihomed customer network-site A 310a. PE1 330a advertises the learned MACs, via the EVPN control plane (e.g., using BGP update messages), to remote PE3 330c to allow PE3 330c to send known unicast traffic to devices 312a1 . . . 312aN attached to the multihomed customer network-site A 310A via PE1 330a. 
If and when PE2 330b is brought online and its interfaces come up, it can optionally advertise an auto-discovery per EVPN instance (“AD/EVI”) route to indicate to one or more remote PEs that it 330b can forward known unicast traffic toward the multihomed customer network-site A 310a, even if PE2 330b has not explicitly advertised every individual MAC from the multihomed site. This “aliasing” behavior was described above. Unfortunately, however, this behavior has the potential to cause problems. More specifically, depending on its implementation, PE2 330b might be unable to forward known unicast traffic for a given MAC to the multihomed customer network-site A 310a until it has installed a MAC route in its local MAC table. Such a local MAC route would be learned by PE2 330b either (A) through local learning once the link to the multihomed customer network-site A 310a comes up, or (B) though control plane learning of MACs reachable via the multihomed customer network-site A 310a advertised by PE1 330a. Unfortunately, if PE2 330b advertises its AD/EVI route to PE3 330b before installing all MACs (e.g., from PE1 330a, or through local learning), aliased traffic sent by PE3 330c to PE2 330b for transmission onward to a device on the multihomed customer network-site A 310a might not be forwarded optimally. More specifically, depending on the implementation, PE2 330b might have to drop the traffic or flood it inefficiently.
Thus, there is a need to improve aliasing (e.g., under RFC 7432).