The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Current approaches that address the problem of querying a repository of resources (e.g., XML documents) rely on indexes (typically, B-tree indexes) that are either column-based or function-based. When an XML document is stored in a database, the XML document may be “shredded” and stored in multiple columns. For example, the last modifier property of the XML document (indicating an identifier of the user that last modified the XML document) is stored in a “last modifier” column. If a user wanted to query on the last modifier property of a resource, then in a B-tree index would have to be created on the last modifier column. Any resource-level query had to be rewritten into an appropriate SQL-level query that is able to recognize the index on the last modifier column.
If these indexes are to be used to answer a query, then complex XML-specific query rewrites must occur so that the underlying relational engine was able to use the right indexes. This rewrite could not be done for many queries that searched within the content (or user-defined property) of a resource, since prior knowledge of the XML schema to which the content (or user-defined property) conformed was necessary to perform the rewrite. To see why, suppose the following query was submitted:
select res
from resource_table
where existsnode(res, ‘/Resource/content//PurchaseOrder’)=1;
FIG. 1 illustrates an exemplary resource table 102 that comprises resources with varying and out-of-line content. At lease two resources in resource table 102 are purchase order documents, or in other words, XML documents that may conform to a Purchase Order schema, and are stored in purchase order table 104. At least two other resources in resource table 102 are auction documents, or in other words, XML documents that may conform to an Auction schema, and are stored in auction table 106. Further suppose that an index exists on column 122 of purchase order table 104 and on column 124 of auction table 106. Based on the above query, it is difficult to determine whether to restrict the query to just purchase order table 104 or other tables, such as auction table 106. If a user indicated in a query the schema to which the targeted resources conformed, then the query could be (relatively easily) rewritten. However, users typically do not provide such schema information when issuing queries.
Thus, in typical situations when query rewrite must occur, the compiler must make a complex set of inferences from the query and limit the possible tables and associated indexes as much as possible. Although a compiler has access to metadata of tables and statistics about previous queries, the compiler does not have access to rows of tables, and thus is unable to determine to which schema the contents of a resource conforms.
Queries that accessed columns on which no indexes existed or queries that could not be easily rewritten were evaluated functionally (i.e., indexes are not used) and were therefore executed relatively slowly. Even when rewrite did occur, several complex joins were needed if the indexed column was an out-of-line table.
One approach that addresses the problem of querying a repository of resources uses the XML Index framework provided by Oracle™. However, the XML Index framework does not support fragment extraction within some resources, relies on a join with an additional table for some queries, and does not support indexing of virtual content. Thus, there is a need to provide a more efficient mechanism to process queries.