Businesses need to access historical data to support decision making. The historical data that accumulates over time is stored and used for purposes such as discerning trends from the last year's consumer purchases of a product, or determining whether a price increase affected sales figures for one year over a preceding year. When the decision makers' come up with questions, computer programmers devise queries to obtain data from information stored by the business over time, in order to find information to answer the questions.
Businesses store data in two general ways. First, real time information regarding transactions is stored. For example, online purchases may be entered continuously each day, and shipments may be made within hours of the order being placed. Changes may be made by the purchaser or by the supplier until the transaction is completed. Each order, each purchased item, and each change must be entered into a database by making updates and deletions to the database. Therefore, speed is essential and in order to maximize speed, the size of the database is limited to only that which is necessary.
The second way in which data is stored is to support analysis of business operations over time. The quantity of information stored in such a database can be enormous as information is continuously being deposited. But since the transactions have been completed, the historical data is read-only. In order to reduce the number of records, historical data can be aggregated to a point in time. For example, if five transactions take place to purchase a total of eight units of the same item, the sales can be aggregated for the day to eight. Typically, such read-only, historical, aggregated data is stored in a data warehouse. A data warehouse is traditionally made up of facts which occur in many dimensions. A data warehouse is optimized specifically for aggregate queries.
A data warehouse consists of fact tables and dimension tables. Fact tables contain the quantitative or factual data about a business. The factual data is the information that will be sought by a query. The fact information often consists of numerical additive measurements and can consist of many columns and millions or billions of rows. Dimension tables are usually smaller and hold descriptive data that reflects the dimensions or attributes of a business. Structured Query Language (SQL) queries then use joins between the fact and dimension tables and constraints on the data to return selected information.
Fact and dimension tables differ from each other only in their use within the data warehouse. The physical structure of fact and dimension tables and the SQL syntax used to create the tables are the same. The terms fact table and dimension table represent the roles these objects play in the logical schema. In terms of the physical database, a fact table has foreign key references to other tables that are usually dimension tables. A dimension table has a primary key that corresponds to the key of an operational database table.
Generally, two types of programmers support a business. An application programmer creates an object model of the business in order to create applications to support day to day operations. A transactional programmer creates relational databases and maps data from the applications to the relational databases. Normally, an application programmer does not understand the operational database schema because it is created by object-relational mapping performed by the database programmer. A common area to both types of programmers is the object model created by the application programmer.
FIG. 3 depicts object model 300 of a store. The exemplary object model depicted in FIG. 3 captures the basic attributes of the store and the store's place in a larger company. The exemplary object model also captures basic information about the sales for the store and the items which are being sold. For example, in object model 300, class “Store” 310 has store attributes 312 comprising “storeid,” “name,” and “address.” Object “Region” 340 has region elements 342 “regionId” and “name.” Class “Division” 360 has division attributes 362 comprising “divisionId,” and “name.” Class “Person” 350 has person attributes 352 comprising “ssn,” “name,” “sex,” birthDate,” and “location.” Class “Invoice” 320 has invoice attributes 322 comprising “invoice number,” “data,” “balanceDue,” and “creditTerms.” Class “LineItem” 330 has LineItem attributes 332 comprising “lineItem Number,” “count,” and “unit price.” Class “Item” 370 has Item 372 comprising “upcCode,” “retailPrice,” “color,” “size,” “image,” and “description.” Class Category 380 has Category atrributes 382 comprising “categoryId,” “name,” and “description.”
Lines between the classes indicate a one to one or one to many relationship between classes. Line “sales” 390 relates class “Store” 310 and class “Invoice” 320. Line items 391 relates class “Invoice” 320 and class “LineItem” 330. Line categories 392 relates class LineItem 330, Item 370 and Category 380. Line “region” 393 relates class Store 310 to class Region 340. Line “division” 394 relates class Region 340 to class Division 360. Line “manager” 395 relates class “Store” 310 and class “Person” 350. Line “regional manager” 396 relates class “region” 340 and class “Person” 350. Line “divisionPresident” 397 relates class “Division” 360 and class “Person” 350. Line “subcategories” 398 relates back to class Categories 380.
In a typical top-down mapping, the exemplary object model would be transformed into a relational database with primary keys and foreign keys. In some mappings, JOIN tables would establish relationships shown by the lines between objects discussed above. In other instances, WHERE clauses would select the correct row for an item. Class shapes represent the attributes within a class. Table shapes represent the attributes with columns and rows. Top-down mapping translates the class shapes in the object model to table shapes in the relational database. This translation includes naming conventions and more complex transformations required to store object model classes, attributes and relationships in relational tables within a relational database schema. While a database programmer is familiar with the mechanics of translating class shapes to table shapes, an application programmer generally is not.
It is desirable to design the data warehouse for responsiveness to the company decision makers. For the store in the example of FIG. 3, typical questions from decision makers that the database programmer could be called on to support are “what are the ‘electronics’ sales by division?”; “how did sales in ‘Texas’ do this quarter?”; and “what are the names of our best and worst selling categories?” These queries do not require access to all of the data captured in a transactional database depicted in the example of FIG. 3. Indeed, to answer these questions, much of the detail in the database can be ignored. What is really needed is an orthogonal view of the data optimized for getting relevant information to support the decision making. Moreover, the specifics of an item or a specific invoice are generally not required. Rather, it is the aggregation of values that is meaningful and manageable. Furthermore, reports answering the questions should provide information in easy to use form. The reader of the report should not have to translate category numbers to names to make sense of a report. Thus a data warehouse designed specifically to support decision making will speed processing and responsiveness. It would be desirable to have the application programmer create the data warehouse.
As explained above, the application programmer who creates the object model of the business, and the transactional programmer who creates the relational databases, such as the transactional database and data warehouse, are rarely the same person. It would be desirable to have an application programmer create a data warehouse without having to learn the mapping skills of the transactional programmer. Existing tools allow a data warehouse to be described from the operational database. Therefore, the object-relational mapping created by the database programmer for the transactional database could be used by the application programmer but for the fact that the application programmer normally does not have the skills to understand the operational database schema created by the object-relational mapping of the database programmer. But the application programmer does understand the object model. Indeed, normally the application programmer has created the object model. Therefore, a need exists for a way to enable an application programmer without database programming skills to create a data warehouse. Specifically, it would be desirable to have a method adapted to an application programmer to create the data warehouse using the object model and the existing object-relational mapping of the transactional database. The method would have to create and automatically update the data warehouse without requiring the application programmer to reconstruct the relational data relationships of the transactional database. A further need is to create the data warehouse tailored to the reporting needs of the business.