Some data are naturally organized as hierarchies. Hierarchies are well-known mathematical constructs. In general, a hierarchy is composed of nodes at multiple levels. The nodes at each level are each linked to one or more nodes at a different level. Each node at a level below the top level is a child node of one or more of the parent nodes at the level above. In a tree hierarchy, each child node has only one parent node, but a parent node may have multiple child nodes. In a tree hierarchy, a node that has no parent node linked to it is the root node, and a node that has no child nodes linked to it is a leaf node. A tree hierarchy typically has a single root node.
For example, a flexible file system used by a computer operating system to store contents on a computer readable medium is often organized into a hierarchy of “folders” or “directories.” Each folder can contain any number of files that store data on a computer readable medium and any number of other folders. The folder that contains the files and other folders is the parent node of those files and folders. The files and other folders are the child nodes of that folder. The system typically has one root folder.
Also, data elements in the extensible markup language (XML) are arranged into a tree hierarchy. XML is widely used to store data and exchange data between independent applications. Each data element in XML may be composed of zero or more child elements. Each element also has an element name and zero or more additional element attributes. The XML document is the single root element.
While convenient for many purposes, operations on data organized by hierarchies, such as file systems and XML documents, can be difficult to express. Operations may include, for example, creating, retrieving data from, writing data to, copying, moving and deleting the nodes of the hierarchies, such as files or XML elements. The expression of the nodes and operations may vary from one hierarchical data system to another. It would be convenient to operate on data spread among multiple hierarchies with a single integrated interface that uses a single set of expressions for the nodes and operations.
In one approach, nodes from multiple hierarchies are assembled into one system with an established and convenient interface that functions on a user's equipment (called hereinafter the user's “native system”). For example, in one native system, nodes in a hierarchy are stored in a node table in a relational database, and the parent-child relationships are stored in a hierarchical index. Such an index may list, for example, every parent node, and for each parent node, all of the child nodes that are immediately below the parent node in the hierarchy. In such a system, SQL commands can be used to list the nodes that satisfy certain criteria. Operations on the nodes can be performed by one or more stored procedures.
Maintaining a node table and hierarchical index in this manner enables one to use an SQL query on a file system to find the path from a root folder to a particular folder or file that satisfies certain criteria on the folder or file attributes. For example, one can get the file names and the paths from the root folder for all files that are owned by user Scott and were created between Jan. 1, 2001 and Jan. 10, 2001, assuming “owner” and “creation date” are attributes of the files in the node table. Then, one can copy those files to a new folder or otherwise operate on those files.
While this approach works well for many kinds of data organized in hierarchies, the approach has some shortcomings. For example, in many cases, the non-native (i.e., foreign) hierarchical data systems provide resources for storing and retrieving the data. To import that data into the native system causes the native system to devote its own resources to store data that are already stored elsewhere. This can greatly increase the expense of maintaining the native system.
Furthermore, the number of nodes in the foreign systems may be large, yet the users of the native system may wish to operate on those nodes infrequently. Importing all those nodes into the native system may bloat the hierarchical index of the native system. A bloated index can lead to increased response time and overall degraded performance by the native system.
In addition, incorporating a new foreign system consumes resources on the user's systems that increase with the amount of data in the new foreign system. The data contents of the new system have to be copied from the new system to the native system, and the native indexes have to be updated. Similarly, detaching a foreign system also consumes resources that increase with the amount of the data in the foreign system. The data contents may have to be deleted from the native system and the native indexes have to be updated. If the contents of the foreign system are changed, the native system may have to both detach the old version of the foreign system and incorporate the new version. Consuming so many resources to attach and detach foreign systems can lead to overall degraded performance by the native system.
Furthermore, there may be aspects of data security that preclude importing the foreign data into the native system. For example, the foreign system may control access to data in the foreign hierarchy based on an unusual or proprietary security model that might be difficult or impermissible to express or enforce in the native system.
Based on the foregoing, there is a clear need for techniques to manage hierarchical data in multiple hierarchies with a single interface, which do not suffer the above deficiencies. In particular, there is a need for techniques to manage data distributed among multiple hierarchies with a single interface without importing all the data into a single hierarchical data system.
The past approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not to be considered prior art to the claims in this application merely due to the presence of these approaches in this background section.