The present specification relates to data warehousing, and, more specifically, to systems and methods for optimizing the preparation and use of data cubes (a/k/a quasi-cubes) based on inputs such as particular rules/decisions designed to appropriately size the cubes by effectively removing the static piling of unwarranted dimensions from the each instance/cycle of aggregated schema which are used for neither analysis nor by business intelligence reporting. This is achieved without disturbing the existing systems and design of source and aggregates either by data structure or schema.
As should be understood by those of ordinary skill in the art, enterprise software systems include computer programs with business related applications. These enterprise software systems often store data in data cubes for analytics or analysis reporting. A data cube can be defined as a multidimensional data storage and organization structure, which can contain data in aggregated form. The data in a data cube is divided into related groups called dimensions. A common example given to illustrate the dimensions of a data cube includes data related to a product sold by a company. One dimension may include data regarding the product sold, another dimension may include data indicating the purchasing customer, another dimension may include data related to price of the product sold, another dimension may indicate the time the product was sold, another dimension may indicate the location of the store which sold the product, etc. Any single individual dimension may or may not be with a hierarchy of structure and inherent dependencies as designed by business.
The demand is growing on schedule granularity and the wide variety of data classifications among data cubes in enterprise operations. A consequential effect of this demand is the ever increasing operational data sizes and the drudgery of performance in every sphere of data mining activity in an enterprise. For example, the resultant increased cube sizes are affecting the performance of the schedule updates and the cube's refresh (data cubes are susceptible for cumulative increase of storage size as a single refreshed object). Further, there are varied duration requirements of scheduled cube data during reporting (no matter the selection of dimensions and measures is user choice but the recalculations needed to be done effective to selection). The schedule is delivered with one and more cube aggregates as per the business requirements (E.g. daily cube or hourly cube refresh).
There are existing conventional solutions (e.g., so called “quasi-cubes”) that attempt to solve the above referenced problem of overwhelming data cube sizes. The existing conventional solutions focus on improving the performance of the data cubes by certain active filtering of cube data while in use and the storage by means of file compressions. The existing conventional solutions, however, have certain negative effects on the performance of the reports, since the filtering or decryption would mean the performance degradation during reporting.
Accordingly, there is a continued need for a method and system for solving the above referenced problem of overwhelming data cube sizes without the negative effects seen with conventional solutions.