In a database management system (DBMS), data is stored in one or more data containers. The term container is used to refer to any set of data that is processed as a set of one or more records, each record being organized into one or more fields. In relational database systems, the containers are called “relations” or “tables,” the records are referred to as “rows,” and the fields are referred to as “columns,” and each table has a fixed number of columns. In an object-relational database a column can be associated with a complex type that is made-up of several attributes. The values that occupy each row of such columns are called objects, which are instances of the complex type associated with the column. Each attribute can itself be associated with a fundamental type, such as a string of one or more characters, an integer of a given size, or floating point number of a given precision, or can be associated with another complex type.
The relational and object-relational models for data are very powerful. In particular, queries can be made for retrieving and storing data in a relational database using a structured query language (SQL). An SQL statement is a command that explicitly describes what data is to be retrieved from or stored in the relational database system as a result of the statement, but leaves up to each system the mechanisms and sequence of operations for producing the desired result. Several database management systems that accept SQL statements are commercially available at the time of this writing.
Some data are naturally organized as hierarchies rather than as relational tables of rows with a fixed number of columns in each table. Hierarchies are well-known mathematical constructs. In general, a hierarchy is composed of nodes at multiple levels. The nodes at each level are each linked to one or more nodes at a different level. Each node at a level below the top level is a child node of one or more of the parent nodes at a level above. In a tree hierarchy, each child node has only one parent node, but a parent node may have multiple child nodes. In a tree hierarchy, a node that has no parent node linked to it is the root node, and a node that has no child nodes linked to it is a leaf node. A tree hierarchy typically has a single root node. Hierarchies are not naturally stored in relational databases, because, for example, hierarchies do not have a fixed number of children for each node, while a particular table does have a fixed number of columns.
For example, a flexible file system on a computer readable medium is often organized into a hierarchy of “folders,” also called “directories.” Each folder can contain any number of files that store data on a computer readable medium and any number of other folders. The folder that contains the files and other folders is the parent node of those files and folders. The files and other folders are the child nodes of that folder.
Also, data elements in the extensible markup language (XML) are arranged into a tree hierarchy. XML is widely used to store data and exchange data between independent applications. Each data element in XML may be composed of zero or more child elements. Each element also has an element name and zero or more additional element attributes.
While convenient for many purposes, searches on data organized by hierarchies, such as file systems and XML documents, are sometimes difficult to express. The expression of the search criteria and the manner to specify the form and order of the results may vary from one data system to another. It would be convenient to be able to use SQL statements to find the contents of interest that meet search criteria on the data elements in the hierarchy or their attributes.
In one approach, described in Agarwal, nodes in a hierarchy are stored in a node table in a relational database, and the parent-child relationships are stored in a hierarchical index that lists the child nodes from a given parent node. In systems that maintain a node table and a hierarchical index, SQL commands can be used to list the nodes that satisfy certain criteria. The nodes can be searched, or results can be presented, or both, in an order based on the relationships in the hierarchical index. The approach enables one to use an SQL query on a file system to find a particular folder or file that satisfies certain criteria on the folder or file attributes. For example, one can get the file names for all files that are owned by user Scott and were created between Jan. 1, 2001 and Jan. 10, 2001, assuming owner and creation dates are attributes of the files in the node table. In some circumstances, a path name for the selected file or folder can be constructed by searching the hierarchical index for the parent that lists the found node, or its ancestor, as a child.
While this approach works well for many kinds of data organized in tree hierarchies, the approach has some shortcomings. The above approach assumes one path per file. Therefore, path names cannot be readily determined in hierarchies with cycles, or in hierarchies that allow a child node to have multiple parent nodes. In cases where a child node may have more than one parent, there may be more than one path for the same file.
Also, the approach does not work well for links that have attributes that are independent of the nodes that the links connect. For example, the approach does not allow a user to form search criteria based on the values of the link attributes. The link attributes are not attributes of any of the nodes; and are therefore not contained in any of the tables used to support SQL queries in the above approach.
Based on the foregoing, there is a clear need for techniques to manage hierarchical data that do not suffer the above deficiencies. In particular, there is a need for techniques to manage hierarchical data that contains cycles. In particular, there is also a need for techniques to support relational queries that involve search criteria on link attributes that are independent of the attributes of the nodes that the links connect.
The past approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not to be considered prior art to the claims in this application merely due to the presence of these approaches in this background section.