A database consists of a collection of structured data. For example, in the case of a relational database, such as a SQL database, the database consists of a set of tables, each of which contain data associated with a particular subject. Each table includes columns and rows, with each column representing an attribute of a particular entry in the table, and each row representing a separate entry in the table. Generally, a SQL database, or other relational database, can contain any number of data columns and rows. SQL databases have, over the course of recent history, proven to be a feasible database technology for enterprise data storage. This is in part because the SQL schema provides robust and complex query execution plans to be executed.
However, SQL databases are not without drawbacks. One example where SQL databases prove sub-optimal is when handling large-scale data. In particular, in cases where a database is required to be scalable across multiple computing systems, SQL databases do not work well. For example, currently a SQL database is stored in the form of an MDF file on an NTFS-based file system. MDF files have a particular structure that includes database tables and associated metadata. If a database table grows large, the MDF file containing that table also must grow large, and cannot easily be separated. As the table (and associated MDF file) grows, even queries only to that table can be delayed due to time to build/update indexes into the table. Furthermore, the time to parse the table to satisfy unindexed queries may be unwieldy. Overall, and for a host of reasons, use of MDF files can result in long time delays between when a client application submits a SQL query to the database and when results are ultimately returned.
Beyond SQL and other relational databases, a database can be stored in a variety of different ways, each of which greatly affects the performance of that database. For this reason, in recent history other organizational schemes for data have been attempted. For example, in U.S. Patent Pub. No. 2011/0302151, an implementation is discussed which uses a server that is interfaced to a number of node database management systems. In that implementation, a SQL interface on the server acts as a front-end to a map-reduce database, such as an Apache Hadoop data processing framework. The Apache Hadoop data processing framework then distributes specific, granular portions of the SQL query received at the SQL interface to database management systems located at each data node. In that implementation, each of the database management systems at each node then processes the data, allowing for some parallelism across the nodes. However, even in such a system, each node is limited by the manner in which data is organized at that node. In such cases, each DBMS at each node suffers from the same scalability issues otherwise encountered in a single database; however in this case, since queries may be distributed to one or more nodes, query result return latency is affected by both the time required to transfer data from and among the nodes, as well as being limited to the worst-case response time of the nodes addressed by a single query. Furthermore, in many cases this approach may be cost-prohibitive, since each data node would be required to manage and execute its own database management system, which can involve substantial software and IT administration fees.
For these and other reasons, improvements are desirable.