Humans tend to organize information in categories. The categories in which information is organized are themselves typically organized relative to each other in some form of hierarchy. For example, an individual animal belongs to a species, the species belongs to a genus, the genus belongs to a family, the family belongs to an order, and the order belongs to a class.
With the advent of computer systems, techniques for storing electronic information have been developed that largely reflected this human desire for hierarchical organization. Conventional computer file systems, for example, are typically implemented using hierarchy-based organization principles. Specifically, a typical file system has directories arranged in a hierarchy, and documents stored in the directories. Ideally, the hierarchical relationships between the directories reflect some intuitive relationship between the meanings that have been assigned to the directories. Similarly, it is ideal for each document to be stored in a directory based on some intuitive relationship between the contents of the document and the meaning assigned to the directory in which the document is stored.
FIG. 1 shows an example of a typical file system. The illustrated file system includes numerous directories arranged in a hierarchy. Two documents 118 and 122 are stored in the directories. Specifically, documents 118 and 122, both of which are entitled “Example.doc”, are respectively stored in directories 116 and 124, which are respectively entitled “Word” and “App4”.
In the directory hierarchy, directory 116 is a child of directory 114 entitled “Windows”, and directory 114 is a child of directory 110. Similarly, directory 124 is a child of directory 126 entitled “VMS”, and directory 126 is a child of directory 110. Directory 110 is referred to as the “root” directory because it is the directory from which all other directories descend. In many systems, the symbol “/” is used to refer to the root directory.
When electronic information is organized in a hierarchy, each item of information may be located by following a “path” through the hierarchy to the entity that contains the item. Within a hierarchical file system, the path to an item begins at the root directory and proceeds down the hierarchy of directories to eventually arrive at the directory that contains the item of interest. For example, the path to file 118 consists of directories 110, 114 and 116, in that order.
Hierarchical storage systems often allow different items to have the same name. For example, in the file system shown in FIG. 1, both of the documents 118 and 122 are entitled “Example.doc”. Consequently, to unambiguously identify a given document, more than just the name of the document is required.
A convenient way to identify and locate a specific item of information stored in a hierarchical storage system is through the use of a “pathname”. A pathname is a concise way of uniquely identifying an item based on the path through the hierarchy to the item. A pathname is composed of a sequence of names, referred to as path elements. In the context of a file system, each name in the sequence of names is a “filename”. The term “filename” refers to both the names of directories and the names of documents, since both directories and documents are considered to be “files”.
Within a file system, the sequence of filenames in a given pathname begins with the name of the root directory, includes the names of all directories along the path from the root directory to the item of interest, and terminates in the name of the item of interest. Typically, the list of directories to traverse is concatenated together, with some kind of separator punctuation (e.g., ‘/’, ‘\’, or ‘;’) to make a pathname. Thus, the pathname for document 118 is /Windows/Word/Example.doc, while the pathname for document 122 is /VMS/App4/Example.doc.
The relationship between directories (files) and their contained content varies significantly between different types of hierarchically organized systems. One model, employed by various implementations, such as Windows and DOS file systems, requires each file to have exactly one parent, forming a tree. In a more complicated model, the hierarchy takes the form of a directed graph, where files can have multiple parents, as in the UNIX file system in which hard links are used.
In contrast to hierarchical approaches to organizing electronic information, a relational database stores information in tables comprised of rows and columns. Each row is identified by a unique RowID. Each column represents an attribute or field of a record, and each row represents a particular record. Data is retrieved from the database by submitting queries to a database server that manages the database. The queries must conform to the database language supported by the database server. Structured Query Language (SQL) is an example of a database language supported by many existing database management systems.
Each type of storage system has advantages and limitations. A hierarchically organized storage system is simple, intuitive, and easy to implement, and is a standard model used by most application programs. Unfortunately, the simplicity of the hierarchical organization does not provide the support required for complex data retrieval operations. For example, the contents of every directory may have to be inspected to retrieve all documents created on a particular day that have a particular filename. Since all directories must be searched, the hierarchical organization does nothing to facilitate the retrieval process.
A relational database system is well suited for storing large amounts of information and for accessing data in a very flexible manner. Relative to hierarchically organized systems, data that matches even complex search criteria may be easily and efficiently retrieved from a relational database system. However, the process of formulating and submitting queries to a database server is less intuitive than merely traversing a hierarchy of directories, and is beyond the technical comfort level of many computer users.
In the past, hierarchically organized systems and relationally organized systems have been implemented in different ways that were not compatible. However, some relationally organized systems incorporate features that allow the systems to emulate a hierarchically organized system. This type of emulation is especially desirable when the storage capability and flexibility of a relational system is needed, but the intuitiveness and ubiquity of the hierarchical system is desired.
One such feature is based on the connect-by clause defined by SQL. The connect-by clause allows a user to issue queries that request data based on a hierarchical organization. The data is returned by a relational database system in a way that reflects the hierarchical organization. The connect-by is used to specify the condition that defines the hierarchical relationship upon which the hierarchical organization is based.
However, using the connect-by clause to formulate queries has disadvantages. First, computing such queries can entail computing multiple join operations, a process that can be very expensive to the database server processing the queries. Use of the connect-by clause is also more burdensome to users. Incorporating a connect-by clause into queries further complicates the already complex task of formulating queries.
Consequently, it is desirable to provide a mechanism that allows relational database systems to emulate hierarchically organized systems in ways that are more efficient than conventional mechanisms for this type of emulation. It is further desirable that this type of emulation be provided in a way that mitigates the complexity of formulating queries that request and return hierarchically organized data.