1. Field
The present invention relates to data storage, and more specifically, to systems, methods and computer products for dynamically adding partitions to distributed directories spread across multiple servers while reducing downtime.
2. Description of Related Art
Organizations are growing at fast rate leading to a need to store enormous data in the directory server. However, directory servers have a scalability limit depending on the type of server and the frequency of operations performed. Once the scalability limit is reached the directory server will no longer perform efficiently.
A distributed directory is a mechanism to get around this problem. In a distributed directory environment data is partitioned across multiple directory servers. A proxy server is deployed to sit in front of the partitioned directory servers. This proxy server works like a virtual directory, providing a single large-directory view to client applications. However, the data is actually stored in multiple directories. The proxy merely manages the operations and routing under the covers, hiding all internals from client applications. Proxy servers use hash algorithms to identify where a client request should be routed. Hashing is the transformation of a string of characters into a fixed-length value or key that represents the original string. Hashing may be used to index and retrieve items.
FIG. 1A depicts a typical image of one proxy with three distributed directory servers. In this image “o=ibm,c=us” is the split distinguished name (DN). Data is split evenly across the directories by hashing on the Relative DN (RIN) just below the base of split. For example entry “cn=entryl,o=ibm,c=us” may go to Server A but all entries below this node will definitely go to Server A only. All backend servers (A, B, C) are required to have split DN (i.e., o=ibm,c=us in this example). FIG. 1B can be used to explain this concept. As per this a branch of Directory Information Tree (DIT) can go to one of the directory servers. This tends to work so long as customers can predict the limit of their directory in the near and/or far future. But unfortunately that is oftentimes not the case. Business and directory scalability requirement is growing faster than anyone can predict. Therefore, it is not unusual to be in a condition where Server A has exceeded its limit and begins performing poorly due to too many entries being present on Server A. Also, directory servers are supposed to be read-centric, and are therefore not optimized for high write frequency. They tend to perform badly if environment is write-centric. Unfortunately it is at this point where more partitions are needed for existing conventional setups, so that writes will be distributed across multiple servers.
Once a given capacity limit has been reached the only conventional solution that exists is to shutdown all the servers, then dump the data and redistribute it in a number of servers having a larger capacity. For example, the data may be loaded in four servers having a larger overall capacity. Only then may conventional systems start the proxy with a new distributed directory setup of the four directories. Redistribution by shutting down the system is not an acceptable solution since it often takes a week, or even longer, to bring the systems back up and get them running. But there is no conventional way around this drawback. There is a need to overcome these drawbacks of conventional systems.