Data is collected in databases to organize information and allow efficient access to the data. One type of database system that is typically used by businesses or other organizations is a data warehouse. A data warehouse is a repository storing integrated information for efficient querying and analysis, and generally is a combination of many different databases, e.g., across an entire organization. Information is extracted from different sources as it is generated or updated, and then translated into a common data model and integrated with existing data at the warehouse. When a user query is submitted to the warehouse, the needed information is provided with differences between data formats already resolved. This makes it much easier and more efficient to run queries over data that originally came from different sources. Additional advantages of data warehousing include easy and efficient execution of complex queries, a single data model and query language used by end users, simpler system design, and reliable and safe data repository.
Other data systems besides data warehouses can also be used. For example, a “data mart” is a database, or collection of databases similar to a data warehouse, but usually smaller and focused on a particular subject or department in an organization, and may be a subset of a data warehouse. Other data systems may provide for the transformation or loading of data, rather than storing data in databases.
Like other computing systems, a data warehouse is a collaboration of processors, memory, disk, operating system, database engine, applications, data model, and business requirements. In a business environment, a data warehouse is often connected to a corporate network to fulfill a number of essential functions, such as end-user connectivity, data transfers, backup and restore, remote management, potential communication in extract, transform, and load (ETL) processes, and so on. Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data.
One problem with current data warehouses is that their components are difficult to balance and it is difficult to maintain that balance over time as the warehouse is upgraded or is provided additional data to store. A balanced collaboration of components in the data warehouse is essential for successful operation, where all the components have been chosen to fit or integrate with each other for a mixture of compatibility, performance, and reliability reasons. If a proper balance is not maintained throughout the components of a data warehouse solution, the users of the warehouse may not obtain the benefit of the massively parallel functionality of database systems or get full value out of all the components of the system.
Accordingly, what is needed is an architecture for data warehouses or other data systems that promotes flexible and efficient operation over time and a proper balance throughout its components, such that the functionality of the system can be used efficiently and effectively. The present invention addresses such a need.