The present invention relates generally to computer software, and more particularly, to an improved method and system for clustering directory objects into groups based on their similar access patterns to a directory system.
A directory system (or “directory” in short) maintains static relationships between various objects in a computer data system. For example, the directory system may be represented as a tree form with multiple levels therein, which defines a fixed structural relationship between any two objects in the directory system. The objects may represent users, files, or any other entities created by or associated with the directory system. Other than the seemingly structural relationships, there are implicit relationships among objects based on their interactions among them, which are dynamic in nature. In one of the simplest situations, for example, a particular user object may access a set of objects more frequently than other objects. In another situation, a particular object may be accessed only by certain user objects. In the present art, there is no method for determining such association among objects based on their dynamic activities in the directory system.
In the directory system, one problem known as the “Sparse Replica Configuration” has very much to do with the dynamic activities of the objects in the directory. A “sparse replica” is a server within a replica ring of a computer network system that holds specific objects and their selected attributes. The configuration of a sparse replica is further specified by a set of object classes and attribute types. Typically, configuring the sparse replica has to be manually performed by a directory administrator. The sparse replica is a useful arrangement from the perspective of data storage or synchronization if the size of an overall partition of data is huge and specific object classes and attribute types required are well known in advance at the server.
In a practical example, assuming a new sales office of a company is to be established at New York, it is found that all the users need, from the perspective of computer network support, is a functional address book. So, a Directory System Agent (DSA) is installed at the office into a “Sales” partition of the directory of the company, and the DSA and relevant replica servers serving the New York office are configured to only hold (e.g., usernames, email IDs and corresponding telephone numbers) information necessary for the address book and incorporated as attributes to the directory tree.
Later on, when the users in the office install new applications that need more than just email and telephone number attributes, the administrator has to add additional attributes to the replica configuration of all remote replica servers. If more applications are added and additional attributes are needed, the administrator is called in again. Each time the administrator is involved, he needs to make a decision as to how many users are using these attributes and whether it is worth having these attributes located on the main DSA or having the user's application clients fetch them from a remote/sparse replica server. Based on his decision, the configuration of the sparse replica servers must change accordingly. It is thus understood that there is a huge amount of administrative effort required to configure the sparse replica servers and keep the configuration in synchronization with the actual needs, for optimal resource usage. Moreover, to determine the access pattern of each attribute and object is a monstrous task.
Assuming that the NY office and another office (e.g., Los Angles) access some common set of attributes (which may change from time to time) which are available from one sparse replica server physically located somewhere in California. Since there is not enough demand for these attributes at either of the two locations (NY, LA) to have a separate server for each office, it may be useful to have a sparse replica server installed physically along the common network route to both these offices, wherein the sparse replica server is as close to both of them as possible. A sparse replica server thus needs to be placed in a strategic “location” based on the activities of the objects accessed.
Needless to say that configuration of a sparse replica is a continuous activity driven by the needs of the users of the directory. This inevitably leads to administrative activities that are, by their very nature, expensive because of the manual involvement of the administrators. Also the administrators are often very busy due to the tremendous task of maintaining the entire directory. Therefore, there is no guarantee that all the requests for configuring the sparse replica will be taken cared of in a timely fashion. For example, it is likely that requests from an “uninfluential” section of users or requests for temporal, though important, changes in the configuration may go unheeded. In many cases, the users may see the difference in the response time between directory operations depending on the existence of attributes in the configuration of the local sparse replica because directory operations involving replicated attributes are faster than those involving attributes which are not replicated.
In order to address this sparse replica configuration problem, a method is needed that would collect and analyze directory access patterns and automatically recommend both the configuration and the location of a sparse replica to improve system performance.