Data stored in a database is usually transactional in nature. A database may be a transactional database or a relational database. Transactional databases simply are a collection of transaction tables. Relational databases add the capacity to match tables together and perform other functions. Since relational databases provide ways to match and organize data, more insight can be achieved through them. Hence, most databases systems by default are relational (encompassing the transactional as well as relational capacity). These databases are intended to contain data in a format that will allow it to be completely matchable with other data in the database or other outside databases. However, this data is not organized in a manner where the natural relationships become apparent or easily utilized. Rather, the relationships in the data are defined and maintained by the application running on top of the database. An individual can only see the relationships in the data if he or she already has an understanding of the database structure and the application functionality.
Database administrators (DBAs) are often faced with the above limitation of data storage when performing data warehousing using conventional ETL tools. An ETL tool extracts data from outside sources, transforms the extracted data to fit business needs, and loads the resulting data into a data warehouse, which may then be used for reporting and analysis. ETL uses a technique known as On-Line Analytical Processing (OLAP). OLAP provides a capability for copying data from a production (application driven) database into separate OLAP tables. While a production database tends to store the data in many small tables with few columns, OLAP tends to shift the production data into fewer, larger tables with many columns.
OLAP uses dimensions that represent relationship descriptors, categories or drivers. Examples of dimensions may include Time, Location, Product, Industry, Account, etc. Dimensions can be organized into “cubes”. A cube contains dimensions and a snippet of data (typically a number) which are reflected by the intersection of dimension selections. There are currently three main OLAP cube systems: ROLAP, in which the cube is virtual, calculated on the fly from the OLAP tables themselves; MOLAP, a literal cube of just dimensions and the intersection data stored separately from, or inside an OLAP table or in the computers RAM memory; and HOLAP, which is a hybrid of ROLAP and MOLAP.
The cube system has helped to fill some of the gaps between relational databases and the natural relationships of data. However, cubes also have weaknesses. The cube system is still a relational system with perfectly matchable data. In fact, the cube system is even more so because the cube treats each dimension equally. The natural relationships of data can still be expressed, but typically through many small cubes with different dimensions to capture the relationship. The second weakness of a cube system is that since cubes utilize intersections, dimensions need to be few and small or the process can create unwieldy cubes with many empty spaces (a cube can contain all possible dimension intersections, even if the data does not exist). Therefore cubes tend to eliminate details that may be important, but inexpressible in that format. Further, OLAP dimensions are not easily organized, nor are they easily matched across databases. Dimensions may contain the same theme (like “time”) but because the elements are different, joining dimensions together are difficult because they are not naturally defined in the database. Moreover, OLAP-based ETL tools cannot effectively respond to reporting needs, such as ad-hoc drill-down requests, because these ETL tools cannot differentiate between high level and low level data.