A number of applications may compile and gather information or data. An example of such an application is modeling or simulation performed by a computer software system (simulated system). The simulated system may run on thousands of computers and use tens of thousands of devices. Furthermore, the system may handle tens to hundreds of pieces of data while performing tasks. The tasks may be performed by hundreds of components on behalf of different users located in various locations.
Therefore, such a simulation can produce a great number of data in a single simulation. Typically, it may be desirable to develop a conclusion or conclusions from processed data. For example, a conclusion from the processed data may show an elapsed averaged time for the simulated system to process the data, providing an insight to how efficient the simulated system is performing.
Data aggregations are summaries of larger collections of data. Data aggregations are desirable because they are typically smaller in size than the original data and allow patterns to be observed that may be difficult to observe from the larger volume of data which include greater detail.
Although smaller and more manageable than original data, data aggregations can still be extremely large and cumbersome if enough data is retained after data aggregation. This is particularly the case when data views or queries are performed using data aggregation. Examples of queries or views include request to aggregate data based on a one or more particular common features. To be effective or provide practical use, data aggregation should allow such queries. However, queries tend to make data aggregations larger by including a greater number of data that what may be needed to find a conclusion or pattern. Practical consideration should be made as to computing resource constraints in limiting the size of data aggregations. Furthermore, the nature of queries of the data aggregations frequently change over time, either due to new external requirements or due to conclusions based on previous aggregations of the same or similar data.
A first solution in addressing problems related to managing data aggregation includes limiting the number of categories of data aggregation, for example determining only how long each task took and how busy each device was. A second solution may be to transfer data aggregations to much larger storage. The disadvantage of the first solution is that there is a tendency that data aggregations do not provide rich enough conclusions to answer particular queries or provide sufficient pattern recognition. Although the second solution provides rich data, it tends to be relatively very slow. Therefore, it is desirable to efficiently control and provide data aggregation in a flexible method that supports a large or potential large volume of data.