In a content management system, items are typically stored in the system in a flat, largely unstructured format. These items may have attributes associated with them (for example a “name” and “author”), they may have links to other items, and they may have content (for example, a document text or an image). Efficient search mechanisms are provided for locating one or more items based on the value of their attributes.
The information is typically structured into some form of hierarchy. For example, a set of documents is often managed by structuring documents into folders with some folders containing nested folders, building up a tree or arbitrary graph structure. A user may want to manipulate and search for items based on this hierarchical structuring. For example, “find all documents in the folder X whose author is A”; or “move all documents in folder Y whose author is B to folder Z”.
An example of a hierarchical data structure is shown in FIG. 1A. This is a simple hierarchy 100 with a folder F0 101 at the root node of the tree, with child nodes in the form of a document D1 102, and two folders F1 103 and F2 104. Folder F2 has two child nodes in the form of documents D2 105 and D3 106.
In a typical implementation of a content manager system, items are used to represent the folders and documents, each having an attribute defining their name. The folder hierarchy is maintained using links between the parent folder item and the child folder or document items.
An example of a hierarchical data structure of this type is shown in FIG. 1B which is the same structure 100 as that of FIG. 1A showing each node with links to its parent and child nodes. The folder F0 101 has a reference 111 to it being the root node and having child nodes D1, F1 and F2. Document D1 102 and folder F1 103 have references 112, 113 indicating their parent is folder F0. Folder F2 104 has a reference 114 indicating that its parent is folder F0, and its child nodes are documents D2 and D3. Documents D2 105 and D3 106 have references 115, 116 indicating that their parent node is F2. The full path name from the root 101 of the hierarchy 100 to a leaf item can only be found by traversing the hierarchy 100 and constructing the resulting path.
This form of implementation has the following advantages: it is simple to construct—a new item can be added to the hierarchy by simply giving it a name and forming the links between parent and child; it is simple to change—an item (including a whole sub-tree of the hierarchy) can be moved simply by breaking and remaking two links. Since a content manager system is often a multi-user system, locks on items and transactions are required during these operations and with this implementation only a small number of locks and short lived transactions are required.
However, there are some problems with this implementation of a content manager system in providing some commonly required functions. To find the full path name for a folder or document requires traversing the hierarchy structure from the root to the leaf and this can be expensive in terms of calls to an underlying database. It is also difficult to efficiently perform certain types of queries, without resorting to iterative or recursive methods which again can be expensive in terms of database accesses.
A solution to these problems can be provided by storing the full folder path name as an attribute of each item as well as (or perhaps instead of) that item's terminal name. This allows simple and rapid retrieval of that full path name and enables efficient search and retrieval of sets of items from a database based on patterns matching complete or partial folder paths. Only one database call may be needed to retrieve many items.
An example of a hierarchical data structure of this type is shown in FIG. 1C which is the same structure 100 as that of FIG. 1A showing each node having its path name as an attribute. Folder F0 101 has an attribute 121 showing the path name “/” indicating it is the root node. Document D1 102, folder F1 103, and folder F2 104 have attributes 122, 123, 124 showing the path name “/FO”. Document D2 105 and document D3 106 have attributes 125, 126 showing the path name “/FO/F2”.
The cost of this form of implementation is that it is no longer simple to manipulate the hierarchy itself. Renaming a folder or moving a folder sub-tree from one place to another becomes very expensive as there may be many items whose “full path” attributes need to be changed. Not only does this require a lot of database access but it requires many locks to be obtained and potentially quite long running transactions to be established. There is also the problem that other users may already have locks on some of the items that require “full path” updates during these operations.