Online Analytical Processing (OLAP) is a technology that is used to organize large businesses databases and support business intelligence (BI). The online analytical processing generally requires pre-computation and storage of information in a data cube, which also may be referred to as an OLAP data cube. The multidimensional expressions (MDX) language may provide a specialized syntax for querying and manipulating the multidimensional data stored in the data cube. For example, a company might wish to summarize financial data by product, by time-period, and by city to compare actual and budget expenses. In this example, the product, time, city, and scenario (actual and budget) are the data's dimensions. Various other dimensions may be used in the data cubes.
The data cubes may be described as a multidimensional data set with an arbitrary number of dimensions. In general, each cell of the data cube may contain a number that represents some measure of a business, such as sales, profits, expenses, budget and forecast. The data cube data may be stored in a star schema or snowflake schema in a relational data warehouse or in a special-purpose data management system.
The BI tools may be used by companies to facilitate enhanced business decisions through reporting, analysis, and monitoring. The data cubes may be used as a form of caching to allow multiple reports to be stored in the data cubes. In various embodiments, the result is almost instant report execution regardless of the report size. The data cube can be created generically in order to serve as a cache for a number of different reports.
Many BI tools and services available may have limitations such as: (1) a single file storage with less than two billion rows per data cube; (2) ability to only scale-up with no distributed architecture; and (3) amount of time needed to run a query is over 24 hours. As more and more analysis data is available on Hadoop clusters, data consumers are requesting that such analysis data on Hadoop interface directly with BI platforms. Hadoop, also referred to as Apache Hadoop, is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project and licensed under Apache License 2.0. The Apache Hadoop framework includes Hadoop Distributed File System (HDFS) and the Hadoop MapReduce, as well as other components.