As human knowledge increases in scope and complexity, the amount of information generated therefrom grows at an astounding rate. The proportion of information-generating devices and methodologies continues to increase, as does the general connectivity of such devices and methodologies. As a result, the amount and complexity of data representing a given concept also tends to increase over time, thus accelerating the overall growth in the size and complexity of data sets across diverse applications. However, the ability to process “big data,” as such ponderously large and complex data sets are commonly known, is often limited by both time and computing resource constraints. Furthermore, “big data” also encompasses collections of data sets in which the individual data assets may not be overwhelmingly large or complex, but the total volume, velocity, variety, etc. of incoming data limit the ability of user entities to derive meaningful information from such data. These collective limitations constitute the fundamental problem underlying big data.
Methodologies for mitigating such constraints are in continuous development. For example, network-connected distributed computing systems, known as cloud computing systems, provide scalable, massively parallelized and aggregated compute resources, such as databases, disk storage, processing capability, and the like. Cloud computing systems provide access to considerably greater computing resources than they might otherwise be able to privately procure and/or administer, thus providing for the use, storage and manipulation of big data associated therewith to a far greater population of users with limited local resources.
However, cloud computing is, by its nature, a “brute force” solution for dealing with the sheer scale of big data. Other logistical issues, such as the length of time involved in getting the data into and out of a cloud computing system in the first instance, as well as the abilities of enterprises, human entities, and the like to quickly efficiently determine or ascertain the contents of big data, generally remain, and become increasingly important as emerging technologies and collaboration models enable greater numbers of, e.g., non-institutional users to participate in the creation, management and dissemination of the concepts underlying the data (as well as the data itself). Additionally, as the scope of information generation grows broader over time, the number of disparate data formats and types between data sets, and at times within a given data set, increases.