Databases are generally divided into two main types: relational or SQL databases that are schema-based, and non-relational or NoSQL databases that are schema-less. Relational databases are structured and data is defined and stored as related entities with attributes across tables (i.e. schemas), thus requiring structures (e.g. tables) to be defined in order to contain and work with the data. Different relational database management systems implement different data types. Some common open source RDBMS (relational database management systems) include: SQLite3, which is an embedded open source relational database management system; MySQL/MariaDB, which is a popular and commonly used open source RDBMS; and PostgreSQL, which is an ANSII SQL-compliant and open source objective-RDBMS; and Oracle, which is a popular and commonly used commercial RDBMS.
Compared to relational databases, non-relational databases are document-oriented and often distributed. NoSQL databases and management systems are schema-less and are not based on a single model (e.g. relational model of RDBMSs) so each database can adopt a different model, depending on its target-functionality. There are a few different operational models and functioning systems for NoSQL databases, such as key/value-based, column-family-based, document-based, and graph-based. Popular NoSQL databases currently include MongoDB, which is a popular document-oriented database, Oracle NoSQL, and Cassandra DB.
NoSQL databases generally offer the benefits of scalability and easy expansion, suitability for big data applications, and ready usage of low-cost hardware platforms. However, the NoSQL database environment is relatively immature compared to the RDBMS world. Modern SQL databases benefit from many years of development and optimization, and offer high performance and availability on virtually every common computer platform. Most critical data processing applications, therefore, use SQL RDBMS databases as part of an overall data processing system. In such applications, many different client devices may source data that represent different data types and that perform different functional operations. In such a case, each data module that accesses a database table may be implemented in upwards of a thousand lines of code. In an example where there are tens to hundreds of different database tables, hundreds of thousands of lines of code may be involved, requiring a significant investment in programming, debugging, and maintenance resources. Managing such a system requires a mechanism where the growth of data types, and thus an increase in the number of database tables, does not exponentially grow the development effort.
Although the use of NoSQL databases is seen as one way of overcoming this challenge, one of the major disadvantages of NoSQL databases is that all data operations and management functions reside in the application code, and any changes to the data requires changes to the application code, and global changes to the data saved in the NoSQL database. In addition, it is very difficult to go into a set of NoSQL document data and perform focused queries to extract specific information. Thus, NoSQL databases are profoundly limited in their ability to manage these applications. For example, key-store databases, such as Redis, can only store JSON documents as single bodies of text (a “text blob”). Complex sub-queries into a text body is not supported, and any query code will need to re-parse the entire JSON text body to be used. Similarly, document-store databases, like MongoDB, require custom query functions to retrieve each different kind of JSON document and data. Any changes to JSON document layouts will break the query functions, thus requiring extensive rewrites of application code.
What is needed, therefore, is a way to reduce the programming overhead of working with data in SQL databases. What is further needed is a way to provide efficient query processing for documents and files in large-scale databases, and in a manner that works on all JSON documents and does not change when documents layouts may change.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.