Introduction
Directories are specialized databases, optimized for read access, with the ability to store information about heterogenous real world entities (like people, resources, policies, services, applications etc.) in a single instance making them a critical component of an enterprise's security and identity management framework. Typically, directories consist of a Directory Information Tree (DIT) of entries. By way of example, the standard way of accessing directories over the internet is LDAP (Lightweight Directory Access Protocol (v3): Technical Specification, RFC 3377. http://www.ietf.org/rfc/rfc3377.txt), which allows distribution of a DIT over multiple servers. LDAP supports distributed processing of directory operations using a referral mechanism. Each server contains a subtree of entries, referred to as a naming context, and references to other servers containing related contexts. When a client request cannot be completely answered by a server, the server returns a referral for another server to the client. A disadvantage is that the operation processing slows down to a significant extent.
LDAP Protocol and Support for Distributed Directories
LDAP is the standard means of accessing directories conforming to the X.500 information model over TCP/IP. LDAP v3, as defined by RFC 2251, 2252 (http://www.ietf.org/rfc/rfc2251.txt and http://www.ietf.org/rfc/rfc2252.txt), specifies the information, naming, and security model, in addition to the functional model used for accessing directories.
LDAP Information and Naming Model
LDAP assumes the existence of one or more directory servers jointly providing access to a DIT, which consists of entries. An entry is defined as a set of attribute value pairs with the required object class attribute determining its mandatory and optional attributes. Each entry has a distinguished name (DN) belonging to a hierarchical namespace. The root of the DIT has a “null” DN. FIG. 1 shows an example directory tree 10 and with an inetOrgPerson [RFC 2798] entry. Each node (represented as a circle) is named with its relative DN (RDN). The DN of an entry is constructed by prefixing its RDN to its parent's DN.
Functional Model
The functional model [RFC 2251] adopted by LDAP is one of clients performing protocol operations against servers. LDAP defines three types of operations: query operations, like search, compare, update operations like add, modify, delete, modify DN (entry move) and connect/disconnect operations like bind, unbind, abandon. Add and delete operations are used to add/delete individual entries to/from the directory. A modify operation can be used to add, delete or replace one or more values for one or more attributes of an entry.
Since directories are optimized for read access, the most common LDAP operation is search, which provides a flexible means of accessing information from the directory. The LDAP search operation (also referred as a query) consists of the following parameters which represent the semantic information associated with a query: (i) base: A DN that defines the starting point of the search in the DIT, (ii) scope: {BASE, SINGLE LEVEL, SUBTREE}, specifies how deep within the DIT to search from the base, (iii) filter: A boolean combination of predicates using the standard operators: AND (&), OR (|) and NOT (!), specifying the search criteria, (iv) attributes: Set of required attributes from entries matching the filter. The special value “*” corresponds to selecting all user attributes. Every entry in the directory belongs to at least one (object) class, thus the filter (objectclass=*) matches all entries in the directory. LDAP filters are represented using the parentheses prefix notation of RFC 2254[3], e.g.: (&(sn=Doe)(givenName=John)). Examples of predicates are: (sn=Doe), (age>30), (sn=smith*) where “Doe”, “30” and “smith*” are assertion values representing equality, range and substring assertions, respectively.
Distributed Directory Model
LDAP supports partitioning a directory into multiple servers with each server holding one or more naming contexts. A naming context is a subtree of the DIT rooted at an entry, known as its suffix and terminated by leaf entries or special referral objects. Referral objects point to servers holding other subordinate naming contexts. In addition to these subordinate referrals which point down the DIT, servers are also configured with a superior (or default) referral which points upwards in the DIT to a server closer to the DIT root. When encountered, referral objects are used to generate referral messages for the client. A referral message contains LDAP URLs of the form ldap://<host>:<port>/<DN> where DN is the suffix of the naming context being pointed to and host is the fully qualified domain name or IP address of the server containing the naming context. The client progresses the operation by contacting the referred server(s).
Before any directory operation can be performed, its target object (e.g. base of a search request) has to be located. In a distributed directory this is referred to as distributed name resolution. In general, distributed name resolution proceeds up the DIT via default referrals until either the root or a naming context having a suffix which is an ancestor of the target object is encountered. After this, distributed name resolution proceeds down the DIT via subordinate referrals until the target object is encountered. Once the target has been resolved, the operation evaluation phase starts at the server where the target was found. A subtree scoped search, which encounters a subordinate referral during evaluation phase sends a referral message to the client to progress the operation.
FIG. 2 shows a system 20 having three servers the hostA 22, the hostB 24, and the hostC 26 collectively serving the o=xyz namespace. The hostA 22 contains a single naming context with suffix as o=xyz, and contains all the entries under o=xyz except entries under the c=in and ou=research, c=us subtrees which are held by subordinate servers hostC 26 and hostB 24, respectively. The client 28 requests a subtree search with base as o=xyz from the hostB 24. Since the hostB 24 does not contain the target, it refers the client to the hostA 22 using its default referral. The client 28 contacts the hostA 22 which contains the target object. The hostA 22 performs the search against the partition it holds and returns three matching entries and referrals (for the hostB 24 and the hostC 26 for subordinate naming contexts) to the client. Finally the client 28 sends search requests (with modified bases) to the hostB 24 and the hostC 26 which return the remaining entries. Four round trips are required between client and the servers to evaluate one request. In the worst case the total rounds of messages exchanged between the client and the servers are nearly twice the number of servers. This example illustrates why the referrals based distributed query evaluation mechanism in LDAP is slow.
Other Solutions
One way to eliminate referrals is to use chaining of directory requests as described in X.500 ITU-T recommendation: Open Systems Interconnection—The Directory: Overview of concepts, models and services, X.518|9594, Part 4: Procedures for Distributed Operations. In this approach, instead of sending a referral back to the client, the server ‘chains’ the request to the referred server and forwards the results to the client. However, chaining in general can be inefficient if the client requests a target entry which is not contained in the server. In such cases resolution of the entry DN is required and the chain could involve a large number of servers, thus requiring the results to flow between several servers before reaching the client. This results in unnecessary server to server communication.
A better way of eliminating referrals is by having a directory proxy (hereinafter referred to as a ‘proxy’) at the front end of the distributed directory. The proxy is typically located between the client and distributed directory servers. The proxy may function as a server intercepting client request or may function as a client to the distributed directory server to evaluate the queries. The proxy has knowledge of how the naming contexts in the directory are mapped to servers in the distributed environment. A proxy configured with this directory topology information then can be used to distribute a client operation into sub-operations (if required), route the request to appropriate servers, and send the consolidated results to the client. Current proxy directory solutions (e.g. ‘proxy directory’ of the open-source OpenLDAP server http://www.openldap.org) are based on configuring the proxy with topology information.
A disadvantage associated with using such static configuration of partition information at the proxy is that each server constituting a distributed directory has its own management interface which allows adding/removing of naming contexts under it. The directory protocols do not provide support for notification of updates to a client (or in this case, the proxy). A further disadvantage is that there is no central means to make changes to the distributed directory configuration. For example, a configuration change in the directory setup may require changes at multiple directory servers. A further disadvantage with the current solutions is that it becomes increasingly difficult to manage the distributed directory leading to incorrect evaluation of directory queries due to topology information at the proxy.