With the continued proliferation of information sensing devices (e.g., mobile phones, online computers, RFID tags, sensors, etc.), increasingly larger volumes of data are collected for various business intelligence purposes. For example, the web browsing activities of online users are captured in various datasets (e.g., cookies, log files, etc.) for use by online advertisers in targeted advertising campaigns. Data from operational sources (e.g., point of sale systems, accounting systems, CRM systems, etc.) can also be combined with the data from online sources. Using traditional database structures (e.g., relational) to store such large volumes of data can result in database statements (e.g., queries) that are complex, resource-intensive, and time consuming. Deploying multidimensional database structures enables more complex database statements to be interpreted (e.g., executed) with substantially less overhead. Some such multidimensional models and analysis techniques (e.g., online analytical processing or OLAP) allow a user (e.g., business intelligence analyst) to view the data in “cubes” comprising multiple dimensions (e.g., product name, order month, etc.) and associated cells (e.g., defined by a combination of dimensions) holding a value that represents a measure (e.g., sale price, quantity, etc.). Further, with such large volumes of data from varying sources and with varying structures (e.g., relational, multidimensional, delimited flat file, document, etc.), the use of data warehouses and distributed file systems (e.g., Hadoop distributed file system or HDFS) to store and access data has increased. For example, an HDFS can be implemented for databases using a flat file structure with predetermined delimiters, and associated metadata (e.g., describing the keys for the respective delimited data values), to accommodate a broad range of data types and structures. Various query languages and query engines (e.g., Impala, SparkSQL, Tez, Drill, Presto, etc.) are available to users for querying data stored in data warehouses and/or distributed file systems.
Unfortunately, multidimensional data model design environments for such distributed data systems can be limited at least in their design collaboration capabilities. Specifically, legacy approaches might merely support the design of multidimensional data models by modeling specialists using specialized tools installed locally on the specialist's computing device. In this environment, one specialist might pass control of a particular data model being designed to another specialist, but both could not concurrently work on the model. Further, each specialist is limited to working on the computing devices operating the specialized tools (e.g., software). Also, in cases when two or more specialists concurrently work on the same data model, multiple versions of the model can exist that need to be manually merged into a single version. Such legacy environments are inefficient in terms of utilization of computing and/or human resources. Further, such legacy approaches can introduce design conflicts and/or errors. For example, while design changes might be syntactically correct, semantic and/or other errors can arise from conflicting changes from various designers, the manual merge process, the incorporation of a change into the overall model, and/or other aspects inherent in the legacy approaches. Also, the need for improved multidimensional data model design collaboration continues to increase as the datasets in distributed data systems in turn continue to increase. For example, a large global enterprise might have an extensive dataset modeled by one or more complex multidimensional data models having aspects that need to be continually managed by multiple designers for various purposes. In this case, legacy approaches having a few specialist designers using specialized tools operating on a few respective computing devices will be limited in achieving the data model design change cycle time desired by the enterprise.
The problem to be solved is rooted in technological limitations of the legacy approaches. Improved techniques, and in particular, improved application of technology is needed to address the problem of providing concurrent error-free multidimensional data model design collaboration among multiple designers at various computing devices. More specifically, the technologies applied in the aforementioned legacy approaches fail to achieve the sought after capabilities of the herein disclosed techniques for data model design collaboration using semantically correct collaborative objects, thus techniques are needed to improve the application and efficacy of various technologies as compared with the legacy approaches.