With the explosive growth of network technologies, including the Internet and mobile devices, the amount of data to be saved, indexed, and retrieved has drastically increased. To store data for easy retrieval, databases have increased in size and complexity. However, the rate at which data is generated has created issues for storage-related technologies. One approach for storing data involves large “Big Data” systems, some of which may be managed in a NoSQL scheme, in which data is distributed across nodes and queries are generated, translated, and delegated to a number of nodes for local processing of data (e.g., as in Hadoop, HFDS, or HBase file system(s)). Other approaches, such as those of conventional relational database techniques, tend to handle the increase of data to be stored with sheer computational power. Yet both approaches commonly rely on multiple computer systems working in concert to perform tasks.
Scaling across multiple computer systems necessitates global indexing of files across the system so that each of the computers in the system knows where each file resides. Global indexing creates a bottleneck that is not easily fixed. As more files are added and indexed, the entire system must be synced, which takes time and slows the system down. One approach to the scaling and indexing issue is to store files in sorted order when received, the rationale being that if the files are always sorted, then syncing is not necessary as each computer or computational node knows how to traverse the sorted index to locate a given file.
However, the sorted approach creates new issues as to data records with many columns and queries that specify those columns. For example, if a record has many columns and a query specifies several of the columns, the system may be able to sort only one column at a time. Thus, a first column is sorted, then a second column, et cetera, until target query data is parsed and ready to provide to a user or program as query results. However, this approach becomes slow and unmaintainable for large data stores having many columns and multitudes of rows, for example, such as those that regularly occur in large enterprise networks or modern Internet environments.