The present invention relates generally to the field of information storage and retrieval and, more particularly, to automating and simplifying information warehouse management system tasks using a high-level declarative information warehousing language and runtime system components to make system operational details transparent to the user.
Rapidly leveraging information analytics technologies (e.g., to mine the mounting information in structured and unstructured forms, derive business insights and improve decision making) is becoming increasingly critical to today's business success. One of the key enablers of new information analytics technologies will be an information warehouse management system (IWMS) that processes different types and forms of information, as well as builds and maintains the information warehouse (IW) effectively. Although traditional multi-dimensional data warehousing techniques, coupled with the well-known extract, transform, and load (ETL) processes, may meet some of the requirements in an IWMS, they generally lack adequacy in several major respects: 1) they often lack comprehensive support for both structured and unstructured data processing; 2) they usually are database-centric and require detailed database and warehouse knowledge to perform IWMS tasks, and hence are tedious and time-consuming to operate and learn; 3) they are often inflexible and insufficient in coping with a wide variety of on-going IW maintenance tasks—such as adding new dimensions and handling regular and lengthy data updates with potential failures and errors.
Although data warehousing techniques—such as multi-dimensional data warehouse models and ETL processing—have been widely practiced in industry, they are increasingly inadequate in their functionalities due to major changes in today's information dynamics, where there is a need for comprehensive support for structured and unstructured data, simple and high-level IW operations, and on-going IW maintenance operations.
Existing data warehousing techniques are mainly designed to handle structured data. However, a large fraction of business enterprise data is in unstructured data formats, e.g., call center problem tickets and customer complaints. Because it is vital for an IWMS to support both structured and unstructured types of data, there is a need for comprehensive support for structured and unstructured data.
The prominent and rapid adoption of information analytics technologies—such as text mining and business intelligence (BI) tools—mandates simple and efficient IWMS systems that can quickly process various information sources and build information warehouses. Current data warehouse systems, however, are typically hard to use and require detailed database and data warehouse knowledge. Even with skilled staff, building data warehouses often takes multi-person weeks and months to complete. With the need to support unstructured data, knowledge and skills about full-text indexers and search engines have become necessary. It is often unrealistic to assume the existence of such knowledge and skills in the business world. For instance, organizations that want to use BI and text mining tools may not have strong database administrators (DBAs) for handling complex ETL processing. Thus, there is a need for IWMS that provide simple and high-level IW operations.
Currently existing ETL solutions for information warehouse management generally focus on the single aspect of building the information warehouse. It is often assumed that once the information warehouse is fully constructed and the data are completely loaded from the given set of data sources, the project is completed. In other words, no subsequent modifications to the data warehouse will be needed. In practice, however, information is constantly changing. It is common for users to frequently adjust the information warehouses by adding new dimensions to an existing IW or modifying and loading new data on a regular basis. All such operations must ensure the overall IW integrity and consistency. Furthermore, due to the on-going data updating and loading, it becomes extremely important for the IWMS system to cope with various forms of system failures and potential error conditions. Current IWMS generally have little or no support for fast failure recovery, error correction, or on-going IW schema changes. The common practice is to simply rebuild the data warehouses from scratch each time that changes are required or to perform lengthy and error-prone manual IW operations. Thus, there is a need for an IWMS that provides on-going IW maintenance operations.
As can be seen, there is a need for information warehouse management systems that provide comprehensive support for structured and unstructured data; simple and high-level warehouse operations; and on-going IW maintenance operations.