In typical relational database systems, users store, update, and retrieve information by interacting with user applications (“clients”). The clients respond to the user's interaction by submitting commands to a database application responsible for maintaining the database (a “database server”). The database server responds to the commands by performing the specified operations on the database.
Relational database systems, herein referred to as simply “database systems”, generally excel at handling structured content that maps to rows and columns. These traditional relational databases also offer operational features that allow clients to deploy the structured content database efficiently. Examples of operational features include partitioning, replication, and export/import of data, etc. Data with which operational features may be implemented by a database system is known as operationally complete data.
Data partitioning allows a client to manage a particular data partition independently from other data partitions. Import and export features allow a client to move data that is organized according to a particular logical model from one database system to another database system without losing the organization of the data. Data that is organized according to a particular logical model may include logical nodes of data arranged in a particular hierarchy, where the relationships between the logical nodes comprises the logical model.
Traditional relational database systems have been extended to manage hierarchically-organized data, which is also known as “unstructured” data for the purposes of relational database systems. Examples of unstructured data include file system data, and XML data, etc. For example, the Oracle XML DB Repository is a component of Oracle Database that is optimized for handling XML data. The Oracle XML DB Repository is described in more detail in the Oracle XML DB Developer's Guide, 10g Release 2 (10.2) Part Number B14259-02, Chapter 1, accessed on Jul. 31, 2009.
Thus, a database system may include a hierarchical repository, also referred to herein as simply a “repository”, which may include one or more hierarchically-organized resources. Such resources may include any kind of data that can be identified using a path, such as files, folders, xml nodes, etc. In one embodiment of the invention, examples of resources do not include relationally structured data, known as tuples.
A repository may be conceptually viewed as a table that stores the content of the resources in the repository, and metadata describing features of the resources, in one or more relational columns. The metadata describing features of the particular resource may include one or more paths to the resource within the repository, creation date, last modified time, content size, owner identifier, etc.
Traditionally, it has been difficult to provide operational features for repositories with hierarchical content. Specifically, to implement operational features on a hierarchical repository, each resource in the repository traditionally is scanned, making implementation of operational features expensive to implement for hierarchical repositories.
To illustrate, repository metadata traditionally refers to information that is not stored within the context of the repository's session, which may make it difficult to implement import/export features for the repository. For example, a particular repository includes owner identifiers for each resource in the repository. The owner identifiers map to user names in a user table that is not stored within the client session of the repository. In order to export the repository, the database system must scan each resource in the repository to determine the user name that corresponds to the resource based on the user table.
Moreover, a repository that shares data with other database entities may cause replication of the repository to be difficult. For example, the repository of the previous example shares the user table with various database security tables. Thus, to replicate the repository, the database system also replicates pertinent entries in the shared user table. However, it may not be appropriate to replicate all entries in the shared user table, e.g., if some of the entries that are used only by the database security tables include sensitive information. Thus, to replicate the repository, the database system scans each resource in the repository to determine which of the rows of the shared user table should be replicated.
Traditionally, a single repository in a database system may include various types of resources. This diversity of resources within a repository may cause creating a partition for one or more of the types of resources in the repository to be difficult. For example, a particular repository includes both purchase order resources and employee information resources. To partition purchase order resources from employee information resources, all of the tables corresponding to the repository are visited, and each resource is scanned to determine the type of each respective resource.
Furthermore, the resource metadata in a repository may include references to physical locations of data. These physical row identifiers, or “row identifiers”, are an optimized means of referring to data within the database system. However, the physical locations of the data structures in one database system lose meaning when imported to a different database system. Therefore, to export a repository that includes physical row identifiers in the metadata for the resources of the repository, each resource is scanned to resolve the row identifier of the resource.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.