The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
For large organizations, there are often a number of separate and geographically dispersed sites that the organization wants to connect through a network. For example, a company may have manufacturing sites, development sites, distribution sites, and a number of sales locations that are located throughout a region, country, or the world. The company wants to interconnect the sites via a network so that the sites can share information and personnel spread out among the sites can communicate with each other. A typical solution is for the company to establish a network among the sites or to purchase network service from a service provider with multi-protocol label switching (MPLS) capability that interconnects the company's various sites together in a private network. Because the company typically wants each site to be able to communicate with any other site, such a network arrangement is described as an “any to any” solution that allows any site to send packets across the private network to any other site.
In such a network of geographically dispersed locations, the organization generally wants to add confidentiality to the communications between the sites, such as by using one or more cryptographic techniques to encrypt and decrypt the network traffic between the sites. For example, a group key management system, such as the group domain of interpretation (GDOI) protocol defined in RFC 3547, can be used to provide cryptographic keys and policy to a group of devices in the network. As a specific example, Internet Protocol Security (IPsec) defined in RFCs 2401, 2404, and 2406 can be used to provide security associations (SAs) that define the cryptographic keys and encryption methods to be used for communications between the sites. The communications between sites can be just between two particular sites or between any number of sites, such as in the form of secure multicasts among the virtual private network (VPN) gateways that interconnect each site to the network.
FIG. 1 is a block diagram that depicts a set of sites 110, 112, and 114 and a key server 120 that are interconnected through a network 100 and a group key management system. For example, the GDOI group key management protocol can be used. For communications involving multiple sites, a group can be formed to include the participating sites. For example, in FIG. 1, sites 110, 112, and 114 can participate in a secure multicast, and therefore, sites 110, 112, and 114 are referred to as the group members.
Key server 120 is responsible for generating group keys and group policy, such as by establishing SAs based on IPsec. Each of sites 110, 112, and 114 registers with key server 120 using the group key management protocol and by providing the required authentication information. Then sites 110, 112, and 114 receive the current security association, denoted in FIG. 1 as SA-1, with the current IPsec keys and policy from key server 120, as depicted by arrows 130, 132, and 134. As a result, sites 110, 112, and 114 can securely communicate with each other based on SA-1.
Because SAs are set to expire after a specified amount of time or need to be replaced if a member of the group leaves, key server 120 periodically pushes updates to the group policy in the form of new SAs, such as SA-2 and SA-3 depicted in FIG. 1. As a specific example, key server 120 sends rekey messages to sites 110, 112, and 114 that transmit the new SA to be used by sites 110, 112, and 114, as depicted by arrows 130, 132, and 134.
One problem with using a single group server, such as key server 120 in FIG. 1, is that the single group server represents a single point of failure for communications among the members of the group. For example, if key server 120 fails, sites 110, 112, and 114 will not receive new group keys when the current SAs expire or when a member leaves the group that would typically require generation and distribution of a new SA to preclude the leaving member from being able to read the communications for the group.
One approach for addressing the single point of failure problem when using a single key server is to use multiple independent key servers. However, if one group member registers with key server A and another group member registers with key server B, the two group members will receive different SAs. As a result, group members registering with different key servers cannot communicate with each other. Instead, only group members that register with the same key server can communicate using the SAs from that key server.
Another approach for addressing the single point of failure problem is to employ multiple groups with each group having a single key server. In order for members of the different groups to communicate, the group members must register with each key server of each group to receive the SA for each group. By having the SA from each key server, any group member can communicate with any other group member using one of the SAs. However, as the number of groups increases, the number of SAs that must be obtained and maintained by each group member increases, which represents a significant scaling problem for a large number of sites that are served by many key servers. For example, in some implementations, the number of sites can number in the hundreds or even thousands, and there can be dozens of key servers that each group member must register with and obtain the different SAs. Thereafter, while each group member has all the different SAs, each group member must identify which SA is being used for each group communication.
Another problem with multiple groups having different key servers is that network partitions can occur, resulting in some group members being unable to communicate with some key servers. A network partition occurs when network interconnections are unavailable resulting in the members of one group being unable to communicate with the key server for another group and possibly some members of the other group.
For example, if a network partition occurs, members of group A are unable to communicate with the key server for group B while members of group B cannot communicate with the key server for group A. Even if the individual members of groups A and B can communicate (even though the members of group A cannot communicate with the key server for group B and vice versa), then as new SAs are generated by each group's key server, the members of the different groups will not share the same SA, and therefore will be unable to communicate with each other.
Yet another approach for addressing the single point of failure problem is to employ a hierarchical arrangement of key servers, such as with the Kerberos authentication system that employs a number of key distribution centers (KDCs). With Kerberos, one KDC is specified to be the master server that maintains and modifies a database of key information. The remaining KDCs are the slave servers, each of which includes a read-only copy of the database from the master server.
Having multiple slave key servers with the hierarchical approach addresses the single failure problem if another slave server fails, since other slave servers can be used to obtain the keys from the database. However, the Kerberos approach is still susceptible to a single failure of the master key server, since the slave key servers are unable to create new objects or to modify current objects in their copies of the database from the master server.
Still another approach for addressing the single point of failure problem is to use a distributed database that allows the same copy of the database to be stored on multiple servers. Each database server acts a master that can update the copy of the database stored on that database server. However, to ensure consistency across the multiple copies of the database, changes to each object in the database must be tracked so that the changes to each object can be applied to all copies of the database in a consistent manner.
A distributed database is not susceptible to a single point of failure since any copy of the database is considered to be a master copy. However, ensuring consistency among multiple changes the objects within the distributed database by the multiple masters requires significantly more complexity through the use of the transaction identifiers to track multiple changes to the same object, in addition to other protocol complexities such as the use of locks and acknowledgement messages to prevent conflicting changes to an object by multiple masters.
Based on the foregoing, there is a clear need for improved techniques for maintaining a unified state among a group of servers. In particular, there is a need for maintaining security associations among a group of servers that distributes group keys and policy to a group of clients serviced by the group of servers.