As geographic information systems (GIS) and associated technologies such as geospatial analysis increase in scope and complexity, the amount of data used therewith (and generated therefrom) grows at an astounding rate. The proportion of information-sensing devices and methodologies generating computer-readable geospatial data continues to increase, as does the general connectivity of such devices and methodologies. As a result, the amount and complexity of data representing a given geographic concept also tends to increase over time, thus accelerating the overall growth in the size and complexity of data sets across diverse applications, including GIS applications. However, the ability to process “big data,” as such ponderously large and complex data sets are commonly known, is often limited by both time and computing resource constraints. These collective limitations constitute the fundamental problem underlying big data, and are particularly relevant to continuing innovation in GIS.
Methodologies for mitigating such constraints are in continuous development. For example, network-connected distributed computing systems, known as cloud computing systems, provide scalable, massively parallelized and aggregated compute resources, such as databases, disk storage, processing capability, and the like. Cloud computing systems provide, e.g., geospatial data providers access to considerably greater computing resources than they might otherwise be able to privately procure and/or administer, thus providing for the use, storage and manipulation of big data associated therewith to a far greater population of users with limited local resources.
However, cloud computing is, by its nature, a “brute force” solution for dealing with the sheer scale of big data. Other logistical issues, such as the length of time involved in getting the data into and out of a cloud computing system in the first instance, generally remain, and become increasingly important as emerging technologies and collaboration models (such as social networks, crowdsourcing and the like) enable greater numbers of, e.g., non-institutional users to participate in the creation, management and dissemination of the concepts underlying the data (as well as the data itself). Additionally, as the scope of geospatial data generation grows broader over time, the number of disparate data formats and types between data sets, and at times within a given data set, increases.