The present invention relates generally to a spatial-temporal, big-data storage system, and more particularly, but not by way of limitation, to system, method, and recording medium for a holistic distributed data storage solution for spatial-temporal data.
Mobile applications can generate an immense amount of spatial-temporal data, which calls for strong support from big data platforms. Conventional techniques attempt to meet this need by stacking a first distributed, non-relational database layer (e.g, an HBase) above a second layer (e.g., a Hadoop Distributed File System (DFS)). However, as general purpose solutions, HBase and HDFS suffer from considerable inefficiencies when handling spatial-temporal data.
That is, there is a technical problem in the conventional techniques that as HBase indexes only a primary key, the spatial-temporal data requires encoding mechanisms to translate the spatial and temporal information into unique keys. The encoding algorithm is crucially important for efficiency of geometry queries, which is a common case in spatial-temporal applications. To reduce the number of HBase scan queries in the conventional techniques, the algorithm must preserve as much continuity as possible when translating high dimensional data into one-dimensional primary key space.
Also, there is a technical problem in the conventional techniques that spatial-temporal applications naturally incubate moving “hot spots”. For example, tens of thousands people simultaneously searching for nearby restaurant data and transportation data after July 4th fireworks generates a large amount of requests hitting a single region server, creating a resource hot spot in the distributed storage system. The region split operation in HBase first generates reference files pointing back to the original StoreFile. The references will only be materialized during the next compaction. Therefore, the servers hosting the two daughter regions (resulting from the region split separation) suddenly host the data locality benefits, which further deteriorates the hunger of resources.
As a result of the technical problems, the conventional techniques are incapable of spatial temporal data support for both range queries within a given geometry and efficient load balancing against moving hot-spots.