1. Field of the Invention
The present invention relates to XML data and more specifically a system and method of providing structure and content scoring for XML.
2. Introduction
XML data is now available in different forms ranging from persistent repositories such as the INEX and the US Library of Congress collections to streaming data such as stock quotes and news. Such data is often queried on both structure and content. Due to the structural heterogeneity of XML data, queries are usually interpreted approximately and top-k answers are returned ranked by their relevance to the query. The term frequency (tf) and inverse document frequency (idf) measures, proposed in Information Retrieval (IR), are widely used to score keyword queries, i.e., queries on content. Those of skill in the art will understand principles associated with IR. However, although some recent proposals of scoring methods that account for structure for ranking answers to XML queries, none of them fully captures fully the possible information available for computing answer scores. Accordingly, what is needed in the art is an improved method for computing answer scores.