Table data structures, and particularly tables in database management systems, are used to store large amounts of data. The demand for efficient data storage for a variety of data intensive applications continues to grow. However, for many such data intensive applications, table data structures have been assumed to be an inappropriate mechanism for storing much of the data generated or obtained by those applications. Furthermore, there appears to be little appreciation that the paradigms associated with table data structures would be very useful in those applications.
A table data structure paradigm can be very useful for storing large amounts of data. However, using a table data structure to store data in a distributed data management system can pose its own challenges. One challenge is how to distribute the data amongst the machines in the distributed system such that a data access does not involve accessing an excessively large number of files across the system. Accessing a large number of files across the system can reduce the overall efficiency of the system.
Accordingly, it is highly desirable to provide a more efficient manner of storing the data of a table data structure across a distributed system.