1. Field of the Invention
The invention generally relates to arrangements for processing top-k queries. More particularly, the invention relates to arrangements for adaptively processing top-k queries on XML-type documents—that is, documents having nested-structure, arbitrary (document-specific) markup.
2. Related Art
The ability to compute top-k answers to extensible markup language (XML) queries is gaining importance due to the increasing number of large XML repositories. {Ref 1}. Top-k query evaluation on exact answers is appropriate when the answer set is large and users are only interested in the highest-quality matches. Top-k queries on approximate answers are appropriate on structurally heterogeneous data (e.g., querying books from different online sellers). In both cases, an XPath query may have a large number of answers, and returning all answers to the user may not be desirable. One of the prominent querying approaches in this case is the top-k approach that limits the cardinality of answers by returning k answers with the highest scores.
The efficiency of top-k query evaluation relies on using intermediate answer scores in order to prune irrelevant matches as early as possible in the evaluation process. In this context, evaluating the same execution plan for all matches leads to a lockstep style processing which might be too rigid for efficient query processing. At any time in the evaluation, answers have gone through exactly the same number and sequence of operations, which limits how fast the scores of the best answers can grow. Therefore, adaptive query processing that permits different partial matches to go through different plans is more appropriate.
Adaptivity in query processing has been utilized before {Refs 1, 4, 12, 25} in order to cope with the unavailability of data sources and varying data arrival rates, by reordering joins in a query plan. However, there is a need to use adaptive techniques for efficiently computing exact and approximate answers to top-k queries in XML.
U.S. Patent Application Publication No. 2002/0156772 (Chau et al.) disclose several methods for retrieving XML documents, many of which relate to storing documents in columns in a table.
U.S. Patent Application Publication No. 2003/0101169 (Bhatt et al.) discloses a method for extracting, transforming, and persistently storing data that is in Extensible Markup Language (“XML”) format.
U.S. Patent Application Publication No. 2003/0208484 (Chang et al.) discloses a method of dynamic optimization of queries using methods that perform on-the-fly optimizations based on cost predictions to reduce overall response time.
U.S. Patent Application Publication No. 2004/0098384 (Min et al.) discloses a method of processing a query for XML data having an irregular structure using an Adaptive Path indEX for XML data (APEX), which is said to improve query processing performance by extracting frequently used paths from path expressions having been used as queries for XML data, and updating the APEX using the frequently used paths.
U.S. Patent Application Publication No. 2004/0205082 (Fontoura et al.) discloses querying a stream of XML data in a single pass using standard XQuery/XPath expressions.
U.S. Pat. No. 6,654,734 (Mani et al.) discloses retrieving XML documents using schema (Document Type Definitions) for query processing and optimization.
U.S. Pat. No. 6,766,330 (Chen et al.) discloses methods to query and access XML documents while guaranteeing that the query outputs conform to the document type definition (DTD) designated by the user.
Thus, there is still a need to use adaptive techniques for efficiently computing exact and approximate answers to top-k queries in XML.