Data warehousing involving the collection of large numbers of business reports is conducted in order to increase the transparency of corporate activity over long periods. Such data warehouses generally have characteristics like the following. Although the access reference is low, large amounts of data are saved for long periods. Also, since data is stored over long periods, reports and other information are stored in a general format (such as XML (eXtensible Markup Language), for example) able to cope with changes of application due to changes in business practices. Furthermore, endeavors to reduce costs are made by utilizing a relational database (RDB) as a long-term asset.
For this reason, it is typical to store XML documents directly as columns in an RDB as assumed by design, and the resulting increases in data search costs (performance problems, for example) and investment costs of disks for large-volume data storage are becoming significant. Formats that conform to Information Lifecycle Management (ILM) considerations are also becoming typical data layout structures.
In contrast, regarding the performance problems mentioned above, there exists technology that localizes records to be accessed in a database management system by adding an index with an XML-like structure (such as creating an index that stores XML document paths and their values, for example). As a separate approach, there also exists technology that, although searching all records, uses parallel processing to speed up overall performance.
The above-described technique of identifying target records by adding an index with an XML-like document structure is an adaption of existing ideas about RDB index structures to XML. However, with long-term data storage, there is a possibility that design changes may occur in order for the XML document structure (report format) to keep pace with changes in business practices, thus increasing costs.
In addition, the technique that performs parallel searches is a method that yields performance advantages by splitting information across disks, which incurs the usage of extra resources. Moreover, partitioning to split up data in order to yield parallel performance gains involves designing how to level access across disks for long-term data management, and is burdensome from a cost perspective.