Large data sets may exist in various sizes and organizational structures. With big data comprising data sets as large as ever, the volume of data collected incident to the increased popularity of online and electronic transactions continues to grow. For example, billions of records (also referred to as rows) and hundreds of thousands of columns worth of data may populate a single table. The large volume of data may be collected in a raw, unstructured, and undescriptive format in some instances. However, traditional relational databases may not be capable of sufficiently handling the size of the tables that big data creates.
As a result, the massive amounts of data in big data sets may be stored in numerous different types of data storage. Each different data storage format typically has a different interface approach as well. For users, the difficulty of learning the various interface protocols each having varying query syntaxes and adapting programs to interact with multiple storage formats creates difficulties for users of big data formats. In big data, different applications may operate best on different storage formats. For example, an application needing a near-instantaneous response time for a user experience may demand a platform designed to return fast query results.
Furthermore, users making use of the various data storage formats available often make copies of tables, query result sets, and/or data storage structures to support separate applications. The data in such copies may not be maintained and updated regularly. The data in the copies may also have incorrect access controls and/or metadata describing the data as a result of the unmaintained nature of the copies. Additionally, the copies may consume additional disk space.