The approaches described in the BACKGROUND section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Humans tend to organize information in categories. The categories in which information is organized are themselves typically organized relative to each other in some form of hierarchy. For example, an individual animal belongs to a species, the species belongs to a genus, the genus belongs to a family, the family belongs to an order, and the order belongs to a class.
With the advent of computer systems, techniques for storing electronic information have been developed that largely reflected this human desire for hierarchical organization. Conventional computer file systems, for example, are typically implemented using hierarchy-based organization principles. For example, a typical file system has directories arranged in a hierarchy, and documents stored in the directories.
Information arranged in a hierarchy is referred to herein as an information hierarchy. An information hierarchy can be represented as a hierarchy of nodes. FIG. 1 is a directed graph that illustrates an information hierarchy 100. Information hierarchy 100 includes eight nodes. The highest node in the hierarchy is referred to as the “root” node. The nodes at the end of each branch in the hierarchy are “leaf” nodes. The nodes between the root node and the leaf nodes are “intermediate” nodes. In the illustrated hierarchy, nodes 1, 2, 3 and 6 are intermediate nodes, and nodes 4, 5 and 7 are leaf nodes.
In an information hierarchy, the nodes correspond to information. Typically, the piece of information associated with each node has some form of name and some type of content. Node 1 has the name a, node 2 has the name b, node 3 has the name c, and so forth.
In an information hierarchy that corresponds to a hierarchical file system, the nodes typically correspond to files or directories. Each file has a name and some form of content. Each directory has a name and content in the form of zero or more files.
A node is said to have a parent-child relationship with any node that is an immediate descendant of the node in the information hierarchy. The parent-child relationship between a particular parent node and child node is also referred to herein as a parent-child link or just simply link. In FIG. 1, directed edges between nodes represent parent-child links; edges go from the parent node to the child node. Thus, the root node has a link LINKR1 to node 1. Node 1 has a link LINK12 to node 2 and a link LINK13 to node 3. Node 2 has links LINK24 and LINK25 to nodes 4 and 5, respectively. Node 3 has links LINK36 and LINK37 to nodes 6 and 7, respectively.
In some information hierarchies, a child node may have multiple parents. Node 2 is not only a child of node 1 but also of node 6, and therefore not only has a link from parent node 1 but also has LINK6 from parent node 6.
Paths and Path Levels and Pathnames
Each node in an information hierarchy may be located by following a “path” through the hierarchy to the node. The path to a “target” node begins at the root node and proceeds down the hierarchy of nodes to eventually arrive at the target node. For example, the path to node 6 consists of nodes root, 1, 3 and 6.
Each node in the path corresponds to a path level. The number of levels in a path is the number of nodes in the path. Thus, the path to node 6 has four levels. The root corresponds to the first level, node 1 to the second level, node 3 to the third level and node 6 to the fourth level.
As mentioned before, a node may have more than one parent. Further, in some types of information hierarchies, nodes may have the same name. To unambiguously identify a given node, more than just the name of the node may be required. A convenient way to identify a specific node and/or its location within an information hierarchy is through the use of a “pathname”. A pathname is a concise way of uniquely identifying a node based on the path through the hierarchy to the node. A pathname may be composed of a sequence of names of nodes in the path. One way to represent a pathname is separate the node names by a delimiter. Often, a ‘/’ is used to delimit the names of nodes in the path. A leading ‘/’ refers to the root. The pathname that identifies node 6 is ‘/a/c/f’. Standards for paths and pathnames are described in, for example, XQuery 1.0 and Xpath 2.0 Data Model, W3C Working Draft, 29 Oct. 2004, which is incorporated herein by reference.
Relational Database Versus a Hierarchical System
In contrast to a hierarchical file system, in a relational database information is stored in tables comprised of rows and columns. Each row is identified by a unique row id. Each column represents an attribute or field of a record, and each row represents a particular record. Data is retrieved from the database by submitting queries to a database server that manages the database. The queries must conform to the database language supported by the database server. Structured Query Language (SQL) is an example of a database language supported by many existing database management systems.
Each type of storage approach has advantages and limitations. A hierarchical file system is simple, intuitive, and easy to implement, and is a standard model used by many application programs. Unfortunately, the simplicity of the hierarchical organization does not provide the support required for complex data retrieval operations. For example, the contents of every directory may have to be inspected to retrieve all documents created on a particular day that have a particular filename. Since all directories must be searched, the hierarchical organization does nothing to facilitate the retrieval process.
A relational database system is well suited for storing large amounts of information and for accessing data in a very flexible manner. Relative to hierarchically organized systems, data that matches even complex search criteria may be easily and efficiently retrieved from a relational database system. However, the process of formulating and submitting queries to a database server is less intuitive than merely traversing a hierarchy of directories, and is beyond the technical comfort level of many computer users.
Hierarchically organized systems and relationally organized systems have been implemented in different ways that were not compatible. However, a relational database system can be enhanced to incorporate features that allow them to emulate a hierarchically organized system. Such database systems are referred to herein as hierarchically enhanced database systems. Hierarchically enhanced database systems may store many kinds information hierarchies, including a file hierarchy, a hierarchy of XML documents, or even a hierarchy of objects. A node within an information hierarchy stored in a hierarchically enhanced database system is referred to herein as a resource. A resource may be, for example and without limitation, a file, an XML document, or a directory that holds either files or XML documents.
Database queries issued to a hierarchically enhanced database system may request data based on a path. Computing such queries may require two important kinds of path resolution operations. A path-to-resource resolution entails determining, given a path and/or path name, what resource or resources are located within a path or identified by the pathname. A resource-to-path resolution entails determining, given a resource, what path or paths a resource is located within and/or whether a resource is within a given path.
Hierarchically-enhanced database systems have been optimized to perform path-to-resource resolution but not resource-to-path resolution. Therefore, there is a need for a mechanism to more efficiently perform resource-to-path resolution.