1. Field of Use
The present invention relates to systems and methods for accessing databases and more particularly to systems and methods for populating data warehouses.
2. Prior Art
As is well known in the art, there are many visual tools for managing the process of populating the data tables of a data warehouse. The term "data warehouse" is generally used to describe a database containing data that was gathered from a variety of sources (e.g. existing production databases). For more information regarding the nature of a data warehouse, reference may be made to the article entitled, "Data Warehousing: An Introduction" by Grayce Booth which appeared in the May/June 1995 issue of the Bull S.A. technical journal entitled, "Technical Update."
The process of populating such data tables is typically carried out by a series of warehouse requests that an administrative tool executes. Each warehouse request has sub-components called "Events" that are used to complete a finite portion of the work of the particular request. Examples of such Events are Extract, Transfer, Transform and Load.
The data warehouse tables are typically designed during the warehouse design process. During this design process, the definitions of the source database tables from which all the data is gathered from existing production databases is captured. The results of the design phase are stored in a metadata repository.
Each warehouse request is typically developed by a system administrator using an interactive tool to describe the sequence of Events of each warehouse request that are necessary to create the data warehouse tables. During this process, the definitions of the database schemas are received from a metadata repository, and utilized by the interactive tool.
The tool is used to develop the sequence of "events" of the request that are to be executed during the warehouse request execution process. At the end of the process, the system administrator finally agrees to the request definition and is provided with the opportunity to schedule the time at which the request is to execute. The described "events" are used to drive the work of a warehouse creation server process. The server process interprets the entries of the visual request display and causes the execution of those processes necessary to execute the warehouse request. This generally involves extracting data from an existing database located in one system (i.e., source database), moving the data to a target system (i.e., the location of a target database), transforming the data to match the requirements of the target database, and then storing the data in the target system database.
An example of the above type of system is the distributed data warehouse (DDW) middleware described in the article entitled, "The Distributed Data Warehouse Solution" by Kirk Mosher and Ken Rosensteel that also appeared in the above referenced May/June 1995 of the Technical Update Journal. In this system, a key element of the system is the distributed data manager/data replication manager (DDM/DRM) component that provides a simple way for a system administrator to extract data from existing databases and then move the selected data to a data warehouse. The system administrator uses a SQL builder tool to describe the data to be extracted from the database. All data is conceptualized using a common relational model. The administrator generated SQL statement forms a request to the DDM/DRM component to extract the data from the particular database. The DDM/DRM component includes a metadata manager component that provides the system administrator with information about the warehouse data and allows the administrator to view and modify the metadata. The arrangement allows an administrator to automate and customize the entire process of data extraction, data transfer and data warehouse loading by developing the specific warehouse requests to be scheduled for execution.
A paper entitled, "Bull Warehouse Initiative" prepared by Wayne W. Eckerson, Patricia Seybold Group.COPYRGT. 1996 discusses the enhancement of a metadata repository called Design Manager developed by Transtar Inc. to work in tandem with other Bull warehouse components. The paper also discusses future enhancements of the Design Manager to enable administrators to perform warehouse mappings in the Windows client component of the Design Manager and store them in the repository in addition to using the Design Manager to store all data warehouse metadata, not just database schema and source-to-target mapping information. This information includes transformation rules, extraction, transfer-load statistics and events. The above described Design Manager repository would be enhanced to function as a full fledged data dictionary for metadata that works with other warehouse components.
But, the above described systems do not relieve an administrator from still having to perform time consuming tasks relating to the creation of warehouse requests. Also, from the above, it is seen that the warehouse request generation process can be quite time consuming. This is true because data warehouses typically have complex structures organized for efficiency of data retrieval. In generating the requests, data must be gathered from multiple database systems on multiple machine nodes. The database systems may be of different types or from different vendors. Thus, the gathering process can be quite complex. Once gathered, the data needs to be merged. But the merge can be complicated because of differences in the way the data has been recorded in the different systems. Finally, there is the physical efficiencies to manage. Large amounts of data must be moved efficiently. Processes must be done in parallel, when possible. The processes must be sequenced with dependencies, and carried out as soon as the dependent processes finish. Hence, a system administrator having detailed knowledge of the underlying data structures and the SQL programming language is still required to expend substantial time in constructing such requests.
Accordingly, it is a primary object of the present invention to provide a system and method for facilitating the warehouse request generation process.
It is a further more specific object of the present invention to provide a graphical interface that allows a system administrator to view the different components of warehouse requests for analysis prior to execution by the warehouse system.