In the field of computer networking, many efforts have been made to develop the most efficient and reliable way for managing the millions of users served by large-scale Internet sites. In particular, the problem of authenticating and authorizing users has been a challenge given the number and density of users attempting access to certain sites. To manage users, large outward-facing sites employ a “directory service” to store user authentication and role information that must be frequently read. Large outward-facing sites include, for example, customer-oriented Web sites such as e-mail Web sites (e.g., Microsoft Hotmail), shopping Web sites (e.g., Ebay) and banking/investing Web sites (e.g., Merrill Lynch). The directory service authenticates and authorizes users by validating certain supplied credentials such as a user ID and/or password. An implementation example of such a directory service is found in the MICROSOFT ACTIVE DIRECTORY service (a product of Microsoft Corp. of Redmond, Wash.). Directory services allow organizations to centrally manage and share information on network resources and users while acting as the central authority for network security.
A goal of directory services is to provide uninterrupted and continuous service to users attempting access to the outward-facing site. Another goal of directory services is scalability, that is, growth to meet user demand and business complexity. It is not uncommon for outward-facing sites to change over time, starting small and growing incrementally to keep up with demand. To manage the growth, outward-facing sites increase the number of servers performing authentication services. A key architectural element of highly scalable outward-facing sites is “directory partitioning.” A directory partition is a set consisting of directory objects that are managed as a group such that the directory objects are backed-up, restored and served together. Each directory object belongs to only one group. Directory partitioning entails distributing directory objects across the various partitions in the outward-facing site. A single partition can start very small and grow to cover over ten million directory objects. When a more complex organization structure is required, multiple partitions are joined together for easy searching. Partitioning reduces the unit of failure such that if one partition fails, other partitions continue serving directory objects. Partitioning further increases performance of the outward-facing site in that if one machine serves N requests per second, than a directory with M partitions serves M*N requests per second without resorting to replication.
When using partitioning, there exists a mechanism by which a key for a directory object (such as a user ID submitted to the Web server) can be mapped to the partition holding the directory object. This process is called “partition location.” A popular outward-facing method for partition location is referred to as “hashing.” As is known in the art, hashing refers to the process of applying a hashing scheme or algorithm to yield an equal distribution of keys (e.g., user IDs) across partitions (also referred to as “hash buckets”). For purposes of partitioning user IDs, directory objects can be partitioned according to any rational hashing scheme. For example, a simplistic hashing scheme partitions all users with user IDs beginning with the letters A to C on partition 1, letters D to G on partition 2, etc. Locating the proper partition at runtime using hashing can be performed by building the hashing logic into to the application code running on the front-end Web servers.
Once a hashing solution is deployed, the amount of data held in a given partition grows linearly with respect to the total amount of data in the system. If an e-business stores user data in the partition and the user base doubles, so does the size of each of the partitions. In some cases data can grow beyond what the original partitions and original servers can service and the data must be “re-partitioned.” Repartitioning entails adding new servers to the outward-facing site and re-distributing the groups of directory objects across the original and newly added servers in a way that balances the data load across the servers. One possible method to reduce the need to re-partition directory objects is simply to over partition directory objects from the outset. Over partitioning directory objects requires utilizing additional hardware (i.e., back-end servers) to manage small partitions. As the service and the partitions grow, more processors, memory, disks, etc. may be added to the hardware to increase the capacity of the partition. In some cases, the need to ever re-partition the data store can be avoided entirely.
If ample hardware is not available, however, re-partitioning must be employed in order to adequately support increased user demand. One method for re-partitioning directory services known in the prior art requires that the outward-facing site be shutdown temporarily during which time administrators re-partition the directory servers. Shutting down a site that maintains access for large numbers of users is often not a viable option. Another method for re-partitioning directory services entails creating a read/write replica on a newly added server while the directory services remain operational. This scheme, referred to as a “loose consistency model,” entails reading replica information on the original server and propagating that information to the new partition. Because of the inherent latency in propagating the information, there is no guarantee that the information on the new server will be consistent with the information on the original server.
In view of the foregoing, it can be seen that there is a need for a method for re-partitioning directories according to a model that ensures reliability of information without service interruption.