A distributed database is a database (or dataset) in which storage devices are not all attached to a common processing unit such as the CPU and is controlled by a distributed database management system (together sometimes called a distributed database system). Collections of data (i.e., in a distributed database or dataset) may be hosted on the same server, on multiple server computers located in the same physical location, or may be dispersed over a network of loosely coupled sites that share no physical components. A distributed database can reside on network servers on the Internet, on corporate internets or extranets, or on other company networks.
A directory service may embody a distributed database. A directory is a map between names and values. In a telephone directory, the nodes are names and the data items are telephone numbers. In a domain name server, the nodes are domain names and the data items are IP addresses (and aliases, mail server names, etc.). A directory server is a computer server system that stores, organizes and provides access to information in a directory. A directory service is the software system implemented on one or more computers, including directory servers. A directory service typically provides an organized set of records, such as a corporate email directory. A directory service may have a hierarchical data structure. LDAP, or Lightweight Directory Access Protocol, is an application protocol for maintaining distributed directory information services over an Internet Protocol (IP) network. Version 3 of the LDAP protocol (LDAPv3) was first published in 1997 and is in widespread use today.
An LDAP directory often is depicted as a tree, with the root node at the top. An entry is the basic unit of information in an LDAP directory. Each entry includes data for one or more attributes. Each entry has a unique name, the “distinguished name” or “DN.” As between all child nodes of a single parent node, each sibling has a unique attribute, referred to as the RDN, or relative distinguished name, and the DN is the combination of all RDNs in the path from the entry to the root of the directory tree. To illustrate, take the directory entry: cn=john smith, ou=users, dc=example, dc=com. The DN for the entry is cn=john smith, ou=users, dc=example, dc=com, and the RDN is cn=john smith. For this entry, john smith is the data value for the attribute cn (common name), users is the data value for the ou attribute (organizational unit) on the parent entry of cn=john smith, and the data values on the parent entry for the attribute dc (domain component) are example and com.
A directory has a significant advantage over other database technologies in that it includes a flexible schema structure that is separate from the “access path” to the data. In other words, the directory information tree (DIT) structure of a directory is separate from the schema. This and other data model differences allow directories to optimize certain operations for speed (e.g., search operations) and outperform other database technologies, e.g., relational database management systems, for many kinds of problems.
Many distributed databases and directory servers support some form of replication whereby multiple servers contain identical copies of a dataset and changes to one are reflected in another. Replication involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
Replication offers several benefits, such as higher levels of availability where there are multiple servers with a copy of a dataset such that there is no single point of failure. Replication may also offer increased search performance by allowing read-only traffic such as search and authentication operations to be scaled horizontally. Also, replication reduces geographic latency such that replicas may be distributed geographically in multiple data centers. An application that needs access to the data may choose a replica that is in a data center closest to it, reducing the time it takes to access the data. A local replica could be located on one side of a slow network link, and installing a replica on the other side will improve response time for users on the other side of the link. If one directory server is in heavy use, or does not have enough CPU or memory power to handle all requests, some requests can be routed to a replica to reduce the load on the first server. Finally, replicas could be used for failover, meaning that if one server goes down, requests can be automatically rerouted to a replica to minimize disruptions in service.
In many directory service installations the directory contents may be stored on multiple systems. Indeed, a single directory may have multiple identical replicas, each of which can be independently modified. Replication in a directory service is a form of synchronization that is used to propagate changes from one directory server to all replicas of the same directory to ensure that each replica of a directory is, or will eventually be, identical.
However, because there are multiple copies of the data in a directory service, there is the opportunity that the data may become out-of-sync and a client may receive inconsistent results if it queries multiple replicas either directly or through a network load balancer or proxy. Likewise, if changes are replicated asynchronously (i.e., after a response is sent back to the client), the client could be notified of a successfully applied change, but that change might not be propagated to other servers in the topology (either temporarily or permanently).