With the continued proliferation of information sensing devices (e.g., mobile phones, online computers, RFID tags, sensors, etc.), increasingly larger volumes of data are collected for various business intelligence purposes. For example, the web browsing activities of online users are captured in various datasets (e.g., cookies, log files, etc.) for use by online advertisers in targeted advertising campaigns. Data from operational sources (e.g., point of sale systems, accounting systems, CRM systems, etc.) can also be combined with the data from online sources. Using traditional database structures (e.g., relational) to store such large volumes of data can result in database statements (e.g., queries) that are complex, resource-intensive, and time consuming. Deploying multidimensional database structures enables more complex database statements to be interpreted (e.g., executed) with substantially less overhead. Some such multidimensional models and analysis techniques (e.g., online analytical processing or OLAP) allow a user (e.g., business intelligence analyst) to view the data in “cubes” comprising multiple dimensions (e.g., product name, order month, etc.) and associated cells (e.g., defined by a combination of dimensions) holding a value that represents a measure (e.g., sale price, quantity, etc.). Further, with such large volumes of data from varying sources and with varying structures (e.g., relational, multidimensional, delimited flat file, document, etc.), the use of data warehouses and distributed file systems (e.g., Hadoop distributed file system or HDFS) to store and access data has increased. For example, an HDFS can be implemented for databases using a flat file structure with predetermined delimiters, and associated metadata (e.g., describing the keys for the respective delimited data values), to accommodate a broad range of data types and structures.
While multidimensional models and data warehouses have evolved to accommodate larger volumes of data and more extensive insights into that data, users remain most familiar with the traditional data structures (e.g., relational), query languages (e.g., SQL), and associated analysis tools (e.g., Tableau, Excel, QlikView, etc.), that are commonly used for operational data. In contrast, manipulating data stored using multidimensional data models and data warehouses can require expert computer science skills. The user therefore desires to take full advantage of the multidimensional richness of a broad range of subject data and structures using a familiar relational data analysis environment. Legacy techniques can pre-process and/or transform the subject data (e.g., from the distributed file system or data warehouse) for presentation to the relational data analysis tools. However, such legacy techniques are implemented as batch processes and are limited in resource efficiency, data accuracy, schema flexibility, and other performance characteristics.
The problem to be solved is rooted in technological limitations of the legacy approaches. Improved techniques, and in particular, improved application of technology is needed to address the problem of projecting a multidimensional data view of a subject database on to a relational data analysis environment to enable real-time data analyses. More specifically, the technologies applied in the aforementioned legacy approaches fail to achieve the sought after capabilities of the herein disclosed techniques for interpreting relational database statements using a virtual multidimensional data model, thus techniques are needed to improve the application and efficacy of various technologies as compared with the legacy approaches.