Increasing advances in computer technology (e.g., microprocessor speed, memory capacity, data transfer bandwidth, software functionality, and the like) have generally contributed to increased computer application in various industries. Ever more powerful server systems, which are often configured as an array of servers, are often provided to service requests originating from external sources such as the World Wide Web, for example. As local Intranet systems have become more sophisticated thereby requiring servicing of larger network loads and related applications, internal system demands have grown accordingly as well. Simultaneously, the use of data analysis tools has increased dramatically as society has become more dependent on databases and similar digital information storage mediums. Such information is typically analyzed, or “mined,” to learn additional information regarding customers, users, products, and the like.
As such, much business data is stored in databases, under the management of a database management system (DBMS). A large percentage of overall new database applications have been in a relational database environment. Such relational database can further provide an ideal environment for supporting various forms of queries on the database. Accordingly, the use of relational and distributed databases for storing data has become commonplace, with the distributed databases being databases wherein one or more portions of the database are divided and/or replicated (copied) to different computer systems and/or data warehouses.
A data warehouse is a nonvolatile repository that houses an enormous amount of historical data rather than live or current data. The historical data can correspond to past transactional or operational information. Data warehousing and associated processing mechanisms (e.g., On-Line Analytical Processing (OLAP), Relational OLAP (ROLAP), Multidimensional OLAP (MOLAP), and Hybrid OLAP (HOLAP)) are widespread technologies employed to support business decisions and data analysis. Data warehouses are populated at regular intervals with data from one or more heterogeneous data sources, for example from multiple transactional or enterprise resource planning (ERP) systems. This aggregation of data provides a consolidated view of an organization from which valuable information can be derived. Though the sheer volume can be overwhelming, the organization of data can help ensure timely retrieval of useful information. For example, the organization of data in a data warehouse typically involves creation and employment of fact and dimension tables.
Extracting raw data from operational databases and a transform thereof into suitable or useful information is the function of these data warehouses and data marts. In such data warehouses and data marts, typically data is structured to satisfy decision support roles rather than operational needs. In general, before data is loaded into the target data warehouse or data mart, cryptic and conflicting codes should be resolved, and raw data translated into something more meaningful. Also, summary data that is useful for decision support, trend analysis or other end-user needs to be pre-calculated. Ultimately, the data warehouse in general consists of an analytical database containing data that facilitates decision support. Likewise, a data mart is similar to a data warehouse, and it further contains a subset of corporate data for a single aspect of business, such as finance, sales, inventory, or human resources. With data warehouses and data marts, useful information is retained at the disposal of the decision-makers.
One major difficulty associated with implementing data warehouses and data marts relates to cleansing of data. In general, noise associated with data can occur as data accumulates from a plurality of sources. For example values can be mistyped or misinterpreted form such sources. Moreover, typically after a merge between data sources outliers or dirty data are likely to remain undetected. Such anomalies can negatively affect user interaction with data warehouses.