The eXtensible Markup Language (XML) is a World Wide Web Consortium (W3C) endorsed standard for document and data representation that provides a generic syntax to mark up data with human-readable tags. XML does not have a fixed set of tags and thus allows users to define such tags as long as they conform to the XML standard. Data may be stored in XML documents as strings of text that are surrounded by text markup. The W3C has codified XML's abstract data model in a specification called the XML information set (XML Infoset). XML Schemas also may be used to apply a structure to the XML format and content. In the case of an XML Schema, a diagram, plan, or framework for XML data in a document may be defined. Although XML is a well-known format that may easily describe the contents of a document, other non-XML formatted data may be desirable in the same database.
Search engines on relational databases are well-known. A typical standard is the Structured Query Language (SQL) relational database language. Both XML coded and SQL data may be placed in a single database to indicate some data relationship. However, the search of that database may become difficult because the XML values stored in the rows of the SQL database may appear as large objects expressed in text or binary form. Although the SQL relational database information may be fast and efficient, searching the large objects in the XML coded information of the relational database may be inefficient. Typically, the inefficiency results from the excessive time and computer resource consumption involved in opening and examining of the XML coded large objects every time XML data is accessed in the SQL database.
Solutions to this problem include generating an XML index from a “shredded” representation of the XML column from the SQL database. A separate row in the XML index is created for each node (e.g. element or tag) in an XML object. Each row in the XML index contains, among the other columns, the primary key of the primary table associated with the XML object, a node identifier, and the contents of the node itself. The primary key of the XML index is made of the primary key and the node identifier, for example
Because of the one-to-many relationship between rows in the primary table and the XML index, the cost of propagating the changes from the base table to (i.e. maintaining) the XML index can be very costly. Updating a single XML column in a single primary table row can result in massive changes to the contents of the XML index.
When a user makes a change to an XML object in the primary table, current implementations first delete all the corresponding rows in the XML index, and then compute the shredded representation of the new value, and insert in the XML index the result of the computation. This results in deleting first and inserting next all the corresponding rows in the XML index, regardless of whether those particular rows actually changed. A typical user modifies only a small number of nodes inside of a bigger XML object. As a result, many rows are needlessly updated resulting in the inefficient use of processing power.
Therefore, what are needed are systems and methods for identifying which rows in an XML index require updating as a result of an update to a primary table involving an XML object.