The invention relates to a data processing system.
Data processing systems of the type with which the present invention is concerned comprise databases. A database is a collection of structured data for a particular application. The structure of the data is predefined. The data stored in a database may relate to various fields. For example, the data may relate to raw materials used in a process such as a chemical process. The elements each corresponding to a data entry in the database are interrelated with one another in accordance with a predefined structure. Another example of an application of databases is in the field of business information management.
Many businesses or other data processing systems generate a vast volume of data of various types. For example, a business may generate daily files containing records itemising every sale through every outlet; records itemising stock orders and inventories; records itemising taxes paid, and so on. As each process undertaken within an organisation is automated, the volume of data available in electronic form increases.
It would be desirable to collect all such data for analysis. To maintain flexibility for subsequent analysis, it is desirable to store the data in “raw” condition, without omitting or cumulating it (and hence losing information). This is referred to as “warehousing” the data—i.e. storing it in a data “warehouse”—a large store containing one or more databases of such records.
However, the formats used for sales records differ from those used for inventory or tax records, for example. It is therefore difficult to combine the data from such different sources within an organisation (or across organisations). It might be thought possible to use a common format for all records, but practical difficulties in devising an all-encompassing format in advance, and the inherent redundancy of such a format, make this unsuitable in practice.
Further, existing organisations (especially large organisations) are often necessarily diverse in the way they maintain records. A given product may need a different name, or a different formulation, in different territories, and similarly, an organisation may need to be differently structured in different territories.
Finally, existing organisations (especially large organisations) actually change their structures over time—incorporating new components (with new record systems) and divesting components over time, or changing internal organisational structure.
Thus, an existing data warehouse may be based on a collection of tables, one for each type of transaction for which multiple records are to be stored—for example, a table for daily sales of one product type; a table for weekly sales of bulk products of a different type; a table for monthly inventory records; and so on. Data in such tables are loaded into the data warehouse from external data sources. The tables are loaded by using loading routines which are specifically designed in accordance with the data structure of the respective external data source from which the tables is loaded, and the data structure of the database into which the tables are loaded. In other words, each loading routine is a unique interface between an external data source and the database. When the structure of one of the records changes, the operator is faced with the choice of opening a new table for the new structure and ceasing to use the old one, or of redesigning the structure of the previous table (or tables) stored for previous transactions and then reloading all such transactions (which may number large numbers of million records).
In the latter case, the tables loaded via the loading routines are then merged on the basis of an integrated data model (i.e. a model which allows combination of the data from different stored transactions, using data reflecting the structure of the organisation and/or the transactions). The integrated data model is pre-structured in accordance with the business requirements, and the format of the source data of the external data sources. The integrated data model is inflexible, i.e. it is designed to contain only data corresponding to its predefined structure. When the business changes, the data model must be redesigned and the data re-loaded as mentioned above.
A populated database may then be used to create an extract which contain selected data from the database and to display the selected data in a desired format, e.g. in a table, a graph, etc. The desired data is extracted from the database by using a data query routine. Such a data query routine also converts the extracted data into a required data format such that it can be displayed using known spread sheet software, for example.
FIG. 1 shows an example of a conventional data processing system. The conventional data processing system comprises three main elements, namely operational systems and external databases 1, a database 2, and data queries 3. The operational systems and external databases 1 contain the data which is to be loaded into the database 2. The data originates from external data sources 4, 5 and 6 each of which uses an individual source data model, as illustrated by the interconnected blocks in databases 4, 5 and 6, for storing the data. They comprise, for example, multiple sales terminals outputting sales records in predetermined formats; or the sales databases of each regional office of a large organisation.
In order to load the data from the data sources 4, 5 and 6 into the database 2, separate loading routines 7, 8 and 9 are employed respectively. The data in the database 2 is represented in accordance with an integrated data model 10. In order to convert the loaded data from its source data model representation into the integrated data model representation, a separate loading routine 7, 8 and 9 for each external data source 4, 5 and 6, respectively, is required. The integrated data model 10 is specifically designed for the inclusion of data from the external data sources 4, 5 and 6, the source data models of which are known in advance. If data from an additional external database is to be included in the database 2, a new integrated data model 10 has to be designed.
Data queries 3 are created in order to display a selected set of data from the database 2. Data queries 3 are created by loading the selected data via data query routines 11 and 12 into a suitable display software such as Microsoft Excel (RTM), for example, to display the data, as shown at blocks 13 and 14. On extraction of the selected data from the database, the data is converted into the format required by the display software.
As mentioned, when a database is populated, any changes to the business requirements, for example, on which basis the integrated data model is designed requires a new integrated data model to be created. Such a new integrated data model can be created redesigning the existing integrated data model, defining the (new and old) data sources from which data is to be loaded into the database, and adapting the data loading routines accordingly. The new database may then be completed by loading the data—an operation which may bring the database out of use for some time.
More commonly, however, new entities which reflect the change in business requirements are added to the existing integrated data model without changing the existent data. This can lead to a discrepancy between the “logical” data model of the data warehouse and its actual physical realisation.
Such systems encounter disproportionately high maintenance costs as new subject areas (entities) have to be added to the warehouse, or the entire design has to be changed completely to reflect the changed external business environment. Maintenance costs per year of 25% to 100% of the initial development costs are not uncommon. By way of comparison, in transaction processing systems the annual maintenance costs are typically 10% to 15% of the development costs.
This high ongoing cost for a data warehouse is a major contributing factor to why many data warehouse projects do not sustain existing business requirements. Organisations simply may not appreciate what level of investment can be necessary to deal with reflecting business and chronological changes. Indeed, with conventional data warehouse designs, it is questionable as to whether these can ever be satisfactorily reflected.
Accordingly, it is desirable to provide a data processing system which addresses one or more of the above disadvantages.