The present disclosure relates to data visualization, and more specifically, to a computer-implemented method, a system and a computer program product for generating a data cube from data such as transaction data.
With development of e-commerce, massive transactions occur every day and a huge number of transaction data are generated. The transaction data include various attributes regarding the transactions, and may be stored in a fact table. The attributes may include, for example, a time attribute indicating time when the transactions occur, a geographical location attribute indicating geographical locations where the transactions occur, a transaction amount attribute indicating amounts of the transactions, and the like.
In order to analyze the transaction data in the fact table to find a rule inherent in the data, such as a rule regarding purchases performed by people in different countries or regions over time, visualization of the transaction data is often performed. A data cube is widely used for such visualization. The data cube may have a plurality of dimensions associated with the attributes of the transaction data. Each dimension corresponds to one attribute. In each dimension of the data cube, the transaction data (or the transactions) are aggregated in a granularity so that a data distribution in that dimension may be obtained. For example, the data cube may include three dimensions including a time dimension, a geographical location dimension and a transaction amount dimension, which may correspond to x, y, and z axes of a three-dimensional coordinate system. In the time dimension, the transaction data may be aggregated in a granularity of days at which the transactions occur, so that a transaction data distribution in terms of days may be shown in this dimension. Similarly, in the geographical dimension, the transaction data may be aggregated according to in a granularity of cities at which the transactions occur, so that a transaction data distribution in terms of cities may be shown in this dimension.
A challenging problem for generating the data cube is to select a proper granularity for data aggregation in each dimension for the data cube. Specifically, there may be many selectable granularities for each dimension. For example, since there are at least seven time units including seconds, minutes, hours, days, weeks, months and years, the time dimension may have seven selectable granularities including a second granularity, a minute granularity, an hour granularity, a day granularity, a week granularity, a month granularity and a year granularity, so that the data may be aggregated in one of the seven granularities in the time dimension. If the granularity is not properly selected so that the data are not aggregated in a proper granularity, the rule inherent in the transaction data may be not clearly shown by the data cube.
A conventional method to select the granularities for the dimensions of the data cube is to set manually by a user a series of rules so that the granularities are selected according to the rules. However, this method requires the user to input many rules, which is cumbersome. Moreover, if the user is not experienced, the rules set by the user may be not proper, which in turn renders that the selected granularities and accordingly the data cube generated are not optimal.