The problem to be solved generally, is how to manage and analyze big data, e.g., on the order of petabytes of data. Big data is broadly defined as data sets with sizes beyond the ability of commonly-used software tools to capture, curate, manage, and process the data within a reasonable amount of time. The world's information doubles approximately every two years. This information (or data) includes critical intelligence, but the mining of such intelligence becomes cost prohibitive and takes too long for many end-users and applications. Whereas traditional data sets include narrow subsets of historical, structured, static data populating relational databases; big data presents a particularly difficult problem for end-users as it is unlimited, may be both structured and unstructured, is frequently available in real-time and may be iterative. Such big data is simply too much for the present relational database management systems without significant processing, which is time consuming and ultimately renders much of the data outdated and of limited value.
FIG. 1 illustrates an existing process for correlating data. As shown, multiple exemplary data sets D1 to D4 from various sources are loaded onto separate databases (DB1 to DB4) for separate access by an application or user (hereafter “user”). Additional contextual data to augment the relevance of sources D1 through D4 are accessible directly to the user (D5 and D6). The user must correlate and process the data from the data sets to generate a result. The report generation process is time consuming and highly manual, requiring a large investment of time from the user.
A number of existing data warehousing techniques address the backend correlation changes of scenarios such as depicted in FIG. 1 Dimensional modeling has been used to organize data in a data warehouse to increase its analytical value and support end-user queries by adding dimension data to fact data to provide context. This is described in multiple papers by Ralph Kimball; see, for example, “A Dimensional Modeling Manifesto” Aug. 2, 1997, Kimball Group (www.kimballgroup.com), which is incorporate herein by reference in its entirety. While the dimensional modeling technique is available to structure the data after it is saved in DB1 to DB4 to support end user queries, this is a backend technique.
Accordingly, there is a need in the art for an improved system and method for front end processing of big data for near real-time availability for alerting, query and analysis by one or more users.