Large data sets may exist in various sizes and organizational structures. With big data comprising data sets as large as ever, the volume of data collected incident to the increased popularity of online and electronic transactions continues to grow. For example, billions of records (also referred to as rows) and hundreds of thousands of columns worth of data may populate a single table. The large volume of data may be collected in a raw, unstructured, and undescriptive format in some instances. Once written to file, the file structure of big data storage formats is typically static. That is adding and deleting columns may not be supported without creating a completely new copy of a big data table. Relational databases may support adding and deleting columns. However, traditional relational databases may not be capable of sufficiently handling the size of the tables that big data creates.
As a result, the massive amounts of data in big data sets may be stored in numerous different types of data storage. Sensitive data may be copied and stored in various locations across the different types of data storage for various use cases. Additional copies of tables may be created in response to a column being added or deleted. Consequently, the copies may consume terabytes of storage with duplicative data.
Access control across the various copies and data storage formats may also prove difficult. Permissions for columns may change as columns are added and deleted. Similarly, an individual row may contain sensitive data that demands restricted access. The row level access may not be controlled in typical big data storage formats.