With the growth of machine-generated data, mining data for relevant information can become increasingly challenging. In some systems, a cost-effective way to store large amounts of data can be in disk drives. However, these relatively slow disk drives can make data mining difficult. The access speeds (latency) associated with these relatively slow disk drives can increase the time necessary to mine data. The increased time can delay the output of the mining operation as well as the inability to utilize system resources that are being used for the data mining operation for other operations.
In an attempt to minimize latency issues inherent in using relatively slow disk drives, as well as other memory structures, some systems utilize input/output parallelism. For example, relatively large amounts of data may be stored simultaneously on multiple data storage devices. In another example, data in multiple storage devices may be accessed individually at the same time, with the computing output for each analysis sent to a central processing unit for final computing operations. Although input/output parallelism is used in a significant percentage of operating systems, its use may be limited based on the manner in which the data is stored as well as the type of data.
Temporal data (e.g. data associated with a time) is an example of a data type that may not be readily available for parallel operations. Temporal data typically includes dependencies within the data and from data to data. For example, the data may be a log file generated when a user accesses an online store to purchase an item. The user may perform some research, look at reviews, and then purchase the item. The data associated with each of those events may include a time component, e.g. when the user accessed the store, the time associated with the search process, and the like. The time associated with those events is related (dependent) on each other. Further, the user may be one of many users accessing the online store. Thus, the time component of the one user may also be related to, e.g. dependent, the other users.
A potential solution to alleviate the dependency and latency issues, as well as others not specifically mentioned herein, can be to reorganize data to enable efficient future computing operations such as data mining queries. For queries that are deemed important, users can build “one-off” or unique solutions tailored to a particular data mining operation. However, this approach does not readily provide for other data mining operations, and can actually preclude data mining operations if the arrangement of the data prevents proper query operation on the data. For example, the data may have been rearranged in a manner that facilitates a specific query, but the data may have been indexed in a manner that does not provide sufficient information to perform other queries.
It is with respect to these and other considerations that the disclosure made herein is presented.