There are two types of data, structured and unstructured. On the one hand, decades of efforts have been devoted to make database management systems (“DBMSs”) more and more powerful to manage structured data; on the other hand, most of the data in business as well as science are unstructured or semi-structured. The biggest challenge in managing semi-structured data is the schema variability across the data. Several strategies for managing data with schema variability using relational DBMSs have been proposed. These include the binary schema and the vertical schema.
In recent years, a constant push from the application domain has been observed to make it easier for users to move between the two data types. For many applications such as e-commerce that depend heavily on semi-structured data such as extensible markup language (“XML”) data, the relational model, with its rigid schema requirements remains ill-suited for storing and processing the highly flexible semi-structured data efficiently. Therefore, the relational model fails to support applications dependent upon semi-structured data in an effective way.
The flexibility of the XML data model, on the other hand, appears to be a good match for the required schema flexibility. However, the flexibility of XML in modeling semi-structured data usually comes with a big cost in terms of storage and query processing overhead, which to a large extent has impeded the deployment of pure XML databases to handle such data. It is clear that pure relational and pure XML approaches represent two extremes, and cannot support applications that deal with real data perfectly.
Therefore a need exists to overcome the problems with the prior art as discussed above.