The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
There is tremendous growth in the amount of data generated in the world. For many years, much of this data was processed prior to being stored for analysis, based on anticipated data analysis needs. For instance, the data may have been summarized and/or converted from rawer formats to refined formats. Many aspects of the data were essentially discarded during pre-processing tasks. With decreasing storage costs and seemingly infinite capacity due to cloud services, there are fewer reasons to discard old data, and many reasons to persistently keep it. As a result, challenges have shifted from pre-processing data prior to storage and analysis, towards analyzing massive quantities of minimally processed data in rawer formats.
Mining a single massive dataset is non-trivial, but an even more challenging task is to cross-correlate and mine multiple datasets from various sources. For example, a datacenter may monitor data from thousands of components. The log formats and collection granularities of that data vary by component type and generation. Another challenge is that a large fraction of the world's data is considered to be “unstructured,” making it difficult to index and query using traditional database systems. Even if a dataset is considered to be structured, the specifics of the structure may evolve with time, for example, as a consequence of system upgrades or more/less restrictive data collection/retention policies.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.