This invention relates to indexing information stored in a computer system, and more particularly to indexing data stored in different directory servers of a distributed directory service.
It is common for computer users (xe2x80x9cclientsxe2x80x9d) interconnected by an institutional intranet or local area network to gain access to various remote (directory) server sites via an internetwork of computers, such as the well-known Internet communications network. It is also common in network applications to provide a so-called proxy server that links to the internetwork. A proxy server accesses frequently requested data from the remote servers and stores it locally to effectively speed-up access and reduce the download time of future requests for the data. In response to a request from an application executing on a client, the proxy server attempts to fulfill that request from its local storage; if it cannot, the proxy server forwards the request over the internetwork to a server that can satisfy the request. The server then responds by transferring a stream of data to the proxy server, which stores and forwards the data onto the client.
The term xe2x80x9cclientxe2x80x9d is also used to refer to a computer used by a person, the xe2x80x9cuserxe2x80x9d. Accordingly, the user""s computer is referred to as the xe2x80x9cclient computerxe2x80x9d.
The requests issued from the client and proxy server to the server conform to a conventional protocol, such as the lightweight directory access protocol (LDAP). Specifically, the LDAP protocol provides a client-server communication arrangement to access a directory service over a Transmission Control Protocol/Internet Protocol (TCP/IP) network. Examples of a directory service include the NetWare Directory Services (NDS) from Novell, Inc. and the X.500 directory service. Novell""s Directory Access Protocol (NDAP) is a gateway on NDS that conforms with LDAP. NDS, X.500 and the LDAP protocol are well-known and described in the following documents: Novell Directory Services Internals Overview; Technical Overview of Directory Services Using the X.500 Protocol, RFC 1309; X.500 Lightweight Directory Access Protocol, RFC 1487; Lightweight Directory Access Protocol (v3), RFC 2251.
A directory differs from a database in an essential characteristic, a directory is designed for ease of changing the data stored therein on a dynamic basis. In ordinary database design, the data is stored in fields of tables, and is accessed and written to and read from using a designated protocol. To change data in a database requires both deleting the data presently there and writing in desired new data. Both the deleting and writing are accomplished by using the command structure of the protocol.
In contrast, a directory is architected so that the access protocol permits easy access to changing data stored in the directory. Protocols for dynamically changing data stored in a directory are designed to make dynamical changes to the data easy and able to be accomplished with a minimum of steps executed by the user, or his/her client computer. An example of directory operation and protocol is given in the Lightweight Directory Access Protocol (LDAP).
The LDAP protocol is described in many books, in particular in the following two books: the first book, by Timothy A. Howes, Mark C. Smith, and Gordon S. Good entitled Understanding and Deploying LDAP Directory Services, published by Macmillan Technical Publishing, Copyright date 1999; and second book, by Timothy A. Howes and Mark C. Smith entitled LDAP, Programming Directory Enabled Applications with Lightweight Directory Access Protocol, published by Macmillan Technical Publishing, Copyright date 1997, and all disclosures of both books are incorporated herein by reference.
A difference between a directory and a database can be expressed by the statement that a directory can include a database, but a database ordinarily cannot include a directory. A reason is that data may be stored in a directory much as it is stored in a database, but the access to a directory for dynamic changes in the stored data is better than access to a database. In the following discussion attention will be primarily directed to directories. However, as is clear from this discussion, a database could also be used in the discussion, with the exception that to use a database would make access for dynamic changes in the data more cumbersome.
In this document, the conventional protocol used to issue requests from a client is a lightweight directory access protocol (LDAP) and the source server used to store data is an LDAP or NDAP/NDS server. The predicate proxy server stores (xe2x80x9ccachesxe2x80x9d) data retrieved from the server and further builds dynamic indexes for searching the cached data stored on the proxy cache. Notably, searching and storage of data on the proxy server is based on the predicate generated by the predicate logic core of the proxy server.
Any database management system may be used in the following description and used in the practice of the following invention. However, because of ease of reference, any database system, and any directory service, will be referred to as an xe2x80x9cLDAPxe2x80x9d directory service, whether or not it uses the LDAP protocol. That is, the present discussion is not limited to any specific protocol utilized by standard LDAP Lightweight Directory Access Protocol, even though the terminology xe2x80x9cLDAP serverxe2x80x9d is used to refer to any electronically stored database.
The variants (types) of data stored in the LDAP (and NDAP and any directories using any other protocol) directories are typically small to make it easier for applications to directly access the data with a fully-qualified distinguished name; a distinguished name is a technique (similar to the Domain Naming System) for accessing data uniquely within a directory store. However, as the amount of data types stored in an LDAP/NDAP directory increases, it becomes increasingly difficult for an application and associated programs to access all the data and know about all their respective types. The directory may, for instance, contain different types (categories) of data such as printer identifiers (IDs), electronic mail (e-mail) addresses and Internet Protocol (IP) addresses.
Companies typically configure their directory servers such that each server stores a subset of data types and, notably, the subsets (data types) do not overlap. For instance, a company may have two LDAP servers (Server A and Server B). All corporate human resource related information (employee IDs, email and residential addresses, emergency contacts, salaries, etc) are stored on LDAP Server A, whereas all corporate research and development work, including the various projects under development along with interactions between development groups (both external and internal to the company), are stored on LDAP Server B. Having a database use a plurality of database servers is referred to as a xe2x80x9cdistributed databasexe2x80x9d, and a system using a distributed database is referred to as a xe2x80x9cdistributed database systemxe2x80x9d.
The subsets of data stored on the LDAP servers are thus reduced and non-overlapping, primarily to avoid overloading each server. LDAP is a database which operates on a schema, i.e., a format of data that the database stores and understands. A directory server (such as LDAP or NDAP) that is configured to increase the amount of data types it stores (e.g., all possible data formats used in an organization) has a complex schema and processing (including searching) of any request is time consuming and inefficient. Attempts by an organization to develop a searching algorithm for such a schema involve use of hash-based, index searching; however, such searching is also quite complex, resulting in overloading of the server and degradation of its performance.
Hash-based indexing is a way of formulating hints that result in faster look-ups; yet indexes generally consume substantial overhead (such as memory and processor cycles) when developing keys for searching the database. Moreover, updates to a hash-based index searching service may adversely affect processing performance of the server because the updates are directed to the indexes as well as to the database itself. Thus, such an approach results in substantial resource commitments that nevertheless degrade performance of the server.
An improved method of indexing a plurality of directory servers connected as a distributed directory is needed.
The invention relates to a distributed directory service that is constructed based on a predicate, i.e., a query from a client. Broadly stated, the predicate is formed by the query (request) issued by the client. The predicate is used for retrieving data from a plurality of directory servers, and retrieving the data has the following steps. Each directory server, of the plurality of directory servers, is designated by a designation predicate giving a range of data values stored in the directory server. A client predicate is generated to indicate desired information. The client predicate is compared with at least one of the designation predicates to determine which directory servers may contain information requested by the client predicate, and in the event that a match between the client predicate and a particular designation predicate indicates that at least a portion of the desired information may be stored on a particular directory server pointed to by the particular designation predicate, an inquiry is sent to the particular directory server for the desired information. The desired information is retrieved from the particular directory server in the event that the client predicate designates information stored on the particular directory server. The retrieved information is transferred to the client computer.
The predicate is used to form an index by sorting the predicate into a normal form. Steps in sorting the predicate into the normal form include the following. Each symbol of the predicate is represented by a numerical representation, for example the ASCII value used to represent the symbol in ordinary text files. The predicate is expressed as a plurality of primitive predicates, and individual predicates of the plurality of primitive predicates are joined by logical connectors. The logical connectors, and each term in the primitive predicates are represented by the numbers, and the numbers are chosen so that each different logical connector and each different term in the plurality of predicates is represented by a unique number. The logical connectors and the predicates are sorted in numerical order of the unique numbers to form the normal form of the predicate. The normal form of the predicate permits the predicate to serve as an index. The directory may be chosen to be a database.
Other and further aspects of the present invention will become apparent during the course of the following description and by reference to the accompanying drawings.