1. Technical Field
This invention pertains in general to data processing, and in particular to the separation of large amounts of data into smaller partitions to improve throughput and/or speed in subsequent analysis.
2. Description of Related Art
For processing purposes, data is often organized into tables having rows and columns. An individual entry in a table at the intersection of a row and column is referred to as a field. Once a table is produced, a consumer process may access selected rows of the table, but may not need to access all fields in the selected rows. However, the consumer process reads all data in the selected rows at least once, including fields that are not needed. These unneeded fields are then discarded. Reading unneeded fields and then discarding them is wasteful in terms of processing time and resources.
Some developers have attempted to reduce the waste involved in reading and discarding unneeded fields by altering the design of the data tables to more closely follow how the data tables will be used. This requires the developer to decide how to split up columns of the data into sets so that consumer processes read less unneeded fields. Conventionally, these are decisions made by developers when they establish the design of the data tables. However, if a new consumer is added to the mix, or if one is retired, or if the access pattern to the data changes in other ways, the system suffers from the inefficiency. In some cases, the developer may be able to manually redesign the layout of the data tables which requires reshuffling the data and potentially requires changes to the software infrastructure.