A digital data processing system includes three basic elements, namely, a processor, a memory, and an input/output (I/O) system. The memory stores information in addressable storage locations. This information includes data and instructions for processing the data The processor fetches information from the memory, interprets the information as either an instruction or data, processes the data in accordance with the instructions, and returns the processed data to the memory for storage therein. The I/O system under control of the processor, also communicates with the memory element to transfer information, including instructions and data to be processed, to the memory, and to obtain processed data from the memory. Typically, the I/O system includes a number of diverse types of units, including video display terminals, printers, interfaces to public telecommunications networks, and secondary storage devices, including disk and tape storage devices.
In a digital data processing system as described above which supports a number of users, each user has one or more accounts (also known as user directories) which he uses to store files relating to work on various projects. Some files might be very small, for example, a one page memo in a data file, while others might be very large, for example, a list of several thousand clients in a database application file. The number of files in a large system, one which, for example, supports several thousand users and/or accounts, can be quite large. For example, each account might contain between ten and 100 files. Thus a secondary storage device for the system might easily contain 50,000 to 100,000 separate files which must be managed efficiently to keep response times small and to optimize the use of the storage device. A system for managing such environment is typically organized in the following way, as shown in FIG. 5.
A volume on a secondary storage device, that is, the recording medium for the storage device, for example, a disk pack on a disk drive, contains a number of files of different types including directory files 502, i.e., files which identify other files, and non-directory file 504, for example, data or application files. Typically, these files are organized according to a structure known as a directory tree 500. In addition, structures known as file headers, one for each file, are typically listed in an index file. A description of the directory tree and index file follows.
Generally, a tree refers to a data structure having a number of nodes, each of which contains data and one of which is designated the root node 506. Associated with the root node, e.g., linked via pointer fields, are a number of child nodes of which the root node is the parent. The child nodes of the root can in turn be parents of a further number of child nodes. This parent/child association is referred to as branching. Each branch of the tree can be of a different length and continues these parent/child associations through a number of levels of the tree until reaching a level of the tree in which the nodes have no children, herein referred to as leaf nodes 504.
In the case of a directory tree, the root 506 of the tree represents a master directory file that identifies where on the disk the root directory and any subdirectories begin. The root of the tree is associated with a number of nodes, each of which represents either a directory file 502, for example, a user directory for an account (also referred to as a subdirectory), or a non-directory file 504, for example, a data or application file. Directory paths, that is, branches of the tree, lead from the root directory through user directories and subdirectories to a non-directory file.
Generally, an index file 510 is stored at a known location on the disk and comprises a number of file headers 512. Each file header 512 is a block of information relating to a specific file. The information depends on the particular computer system, but might include, for example, the following: file name, type, location, size, access control data, creation date, and access activity (number of read operations performed on the file, for example). Typically, both the directory tree and the index file are used together to manage directory files and their relationships to one another as described below.
It is known to use both the directory files and index file described above to extract relationships between directories. This is done by traversing each node of the directory tree and opening and examining each directory file represented by a node of the tree. Each time a directory file is opened, the index file is searched to locate the file header for the next directory file in the tree, for example, as is done in the VERIFY, DIRECTORY, and COPY commands of the VMS operating system of Digital Equipment Corporation. It is possible, however, for this approach to take an excessive number of I/O operations, cause the disk head to travel to distant portions of the disk to read a directory file and then back to another portion to read the index file, and also to cause excessive paging operations as portions of the index file are brought in and out of memory for each tree node visited.
Therefore, it is desirable to provide a system and method of extracting the relationships between directories that does not rely on a file by file traversal of the directory tree each time information regarding the relationship between directories is desired.