With the advent of computer systems, techniques have been developed to reflect a human desire to categorize information according to a hierarchical organization. Categories in which information is typically organized may be themselves organized relative to each other in some form of hierarchy defining the hierarchical organization.
As an example, computer file systems are typically implemented using hierarchy-based organisation principles. A typical computer file system has directories arranged in a hierarchy and documents stored in the directories. Ideally, hierarchical relationships between the directories reflect some intuitive relationship between the meanings that have been assigned to the directories. Similarly, it might be desirable for each document to be stored in a directory based on some intuitive relationship between the contents of the document and the meaning assigned to the directory in which the document is stored.
An example of a typical file system 300 is provided at FIG. 3. The illustrated file system 300 includes a root directory 302 entitled “/”. The root directory 302 defines a beginning of a path to a directory or to a document stored in a directory. The root directory 302 is associated in a child-parent relationship with a folder 304 entitled “folder A” and a folder 310 entitled “folder D”. The folder 304 is in a child-parent relationship with a folder 306 entitled “folder B” and in a child-parent relationship with a file 308 entitled “file C”. The folder 310 is in a child-parent relationship with a file 312 entitled “file E”. In the example illustrated at FIG. 3, the folders 304, 306 and 310 are directory files (also referred to as “folder files”) whereas the files 308 and 312 are document files.
When electronic information is organized in a hierarchy, each item (e.g., a directory file or a document file), may be identified by the path through the hierarchy to the item. Within a hierarchical file system, the path to an item begins at a root directory (e.g., the root directory 302) and proceeds down the hierarchy of directories to arrive at the directory that contains the item of interest. For example, the path to file 308 consists of folders 302, 304, in that order.
Hierarchical storage systems may allow different items to have a same name. As an example, the files 308 and 312 of FIG. 3 may have a same name. Consequently, to unambiguously identify a given document, more than just the name of the document is required. An example of way to identify and locate a specific item of information stored in a hierarchical storage system might be through the use of a “pathname”. A pathname is composed of a sequence of names, referred to as path elements. In the context of a file system, each name in the sequence of names is a “filename”. The term “filename” refers to both the names of directories and the names of documents since both directories and documents are considered to be “files”. Within a file system, the sequence of filenames in a given pathname begins with the name of the root directory, includes the names of all directories along the path from the root directory to the items of interest. Typically, the list of directories to traverse is concatenated together, typically with separator punctuation (e.g., “/”, “\”, or “;”) to make a pathname. Thus, a pathname for the file 308 may be “/folder A/file C” and a pathname for the file 312 may be “/folder D/file E”.
The relationship between directories and their contained content may vary between different types of hierarchically organized systems. As a first example, Microsoft Windows™ and DOS file systems require each file to have exactly one parent, thereby forming a tree model. As a second example, UNIX file systems may allow files to have multiple parents, thereby forming a graph model.
In contrast to hierarchical approaches to organizing electronic information, a database (e.g., a relational database) stores information in tables comprising rows and columns. Each row may represent a particular record and may be identified by a unique ID. Each column may represent an attribute or a field of the record. Data may then be retrieved from the database by submitting queries to a database server that manages the database.
Each one of the hierarchical file system and the relational database has advantages and limitations.
A hierarchically organized storage system may be simple, intuitive, easy to implement and may be a standard model used by most application programs. Unfortunately, the simplicity of the hierarchical organization does not provide the support required for complex data retrieval operations. For example, the contents of every directory may have to be inspected to retrieve all documents created on a particular date that have a particular filename. Since all directories may have to be searched, the hierarchical organization may be limited in providing a fast retrieval process.
A relational database system may be well suited for storing large amount of information and for accessing data in a flexible manner relative to hierarchically organized systems, data that matches even complex search criteria may be easily and efficiently retrieved from a relational database system. However, the process of formulating and submitting queries to a database server may be less intuitive than merely traversing a hierarchy of directories.
To alleviate the limitations of hierarchical file systems and relational databases, some attends have been made to develop relationally organized systems that allow the systems to emulate a hierarchically organized system. This type of emulation may be particularly desirable when the storage capability and flexibility of a relational system is needed but the intuitiveness and ubiquity of the hierarchical system is desired.
As a first example of an attend to alleviate the limitations set forth above, relational databases compatible with the Structured Query Language (SQL) may rely on a connect-by clause to allow a user to issue queries that request data based on a hierarchical organization. The connect-by clause may be used to specify one or more conditions that define a hierarchical relationship upon which a hierarchical organization is based. However, using connect-by clauses to formulate queries may present disadvantages including (i) computing resources needed for the database server to process such queries; and (ii) complexity of incorporating connect-by clauses into queries usually already complex to formulate.
As a second example, U.S. Pat. No. 7,366,708 teaches a method of and a system for storing hierarchical data in a relational database. Under the depicted approach, information about all children of a given element is stored in a record of this given element. In addition, independent IDs are relied upon to uniquely identify both parent and child elements. Even though this approach provides a fast way to identify children of a given element, it may still present at least some disadvantages, for example, in case of migration of elements from one node of the hierarchical organization to another node of the hierarchical organization.
In addition to the above described first and second examples, other approaches to the storing of hierarchical data in a relational database have been developed. Such other approaches may be divided between so called “hierarchical way” and “ID way”.
Under the hierarchical way approach, each element of a hierarchical structure has its own ID that never changes and a link to a parent element. Under the hierarchical way approach, operations on elements of the hierarchical structure have a computational complexity of O(M), where M is a level of a particular element in the hierarchical structure on which one or more operations have to be conducted. As a result, a complexity of an operation depends on a depth of the hierarchical structure thereby resulting in potentially expensive operations from a computing resources standpoint, for example in case of migration of elements from one node of the hierarchical organization to another node of the hierarchical organization.
Under the ID way approach, each element of a hierarchical structure has an ID which depends on a path in the hierarchical structure (e.g., from a root of the structure to the element) and a link to a parent element. Under the ID way approach, operations on elements of the hierarchical structure have a computational complexity of O(1) but at the expense of limited and/or complex movements of elements within the hierarchical structure as all IDs of child elements of an element being moved have to be modified. As a result, for large and complex hierarchical structure, such operations may also result in expensive operations from a computing resources standpoint.