1. Field of the Invention
The present invention relates generally to an improved data processing system and in particular to a method and apparatus for processing data. Still more particularly, the present invention relates to a computer implemented method, apparatus, and computer useable program code for moving data from an extensible markup language format to a normalized format to generate reports.
2. Description of the Related Art
A database is a collection of information. This information is typically stored as records in which the records are organized using a structural description of the type of information in the database through a schema. One type of database is an extensible markup language (XML) database. An extensible markup language database may provide a logical model to group documents, which are called collections. These collections may be created and managed one at a time. In some implementations, collections may be organized in a hierarchical fashion in much the same way as an operating system directory. An extensible markup language database may be queried using a language, such as extensible markup language path language, (XPath). This language is an expression language for addressing portions of an extensible markup language document or for computing values based on the content of an extensible markup language document. The data may be received in extensible markup language format or converted to an extensible markup language format for storage. The events may be, for example, orders or financial transactions.
In generating reports, queries are run or made against the data in a database. Running queries against a database in which data is also being stored, may cause performance issues with writing and reading of data occurring at the same time. For example, a reporting tool may desire to run selection criteria against data in an extensible markup language database. This type of direct querying is not possible because of the format of the data and a potential problem locking an entire table that will affect insertion of new events being received at the extensible markup language database. Further, the data for an event may be stored in a compressed format, which must be uncompressed before being processed by a reporting tool. Not all reporting tools may handle compressed data.
Currently, DB2 9 allows a user to store data as an extensible markup language column type and allows users to query the data using a structured query language (SQL). DB2 9 is a product available from International Business Machines Corporation. This type of solution allows users to query data in the extensible markup language database, but events are required to be stored in an uncompressed format. Other solutions allow a user to stage data to a normalized format. Staging data to a normalized format means placing the data into a format, such as a flat format, rather than in extensible markup language, for use.