Computers are used to store and manage many types of data. Tabular data is one common form of data that computers are used to manage. Tabular data refers to any data that is logically organized into rows and columns. For example, word processing documents often include tables. The data that resides in such tables is tabular data. All data contained in any spreadsheet or spreadsheet-like structure is also tabular data. Further, all data stored in relational tables, or similar database structures, is tabular data.
Logically, tabular data resides in a table-like structure, such as a spreadsheet or relational table. However, the actual physical storage of the tabular data may take a variety of forms. For example, the tabular data from a spreadsheet may be stored within a spreadsheet file, which in turn is stored in a set of disk blocks managed by an operating system. As another example, tabular data that belongs to a relational database table may be stored in a set of disk blocks managed by a database server.
How tabular data is physically stored can have a significant effect on (1) how much storage space the tabular data consumes, and (2) how efficiently the tabular data can be accessed and manipulated. If physically stored in an inefficient manner, the tabular data may consume more storage space than desired, and result in slow retrieval, storage and/or update times.
Often, the physical storage of tabular data involves a trade-off between size and speed. For example, a spreadsheet file may be stored compressed or uncompressed. If compressed, the spreadsheet file will be smaller, but the entire file will typically have to be decompressed when retrieved, and re-compressed when stored again. Such decompression and compression operations take time, resulting in slower performance.
The best compression/performance balance is particularly difficult to achieve when tabular data includes various different types of data items. For example, a spreadsheet may include some columns that contain character strings, some columns that contain images, and yet other columns that contain binary Yes/No indications. The character strings may be highly compressible using a particular compression technique, but applying the same compression technique to the other types of data in the spreadsheet may yield no benefit. On the other hand, the images contained in the spreadsheet may be highly compressible using a compression technique that yields no benefit when used on character strings. Under circumstances such as these, whether the user chooses to compress the spreadsheet file using one of the techniques, or not at all, the result is inevitably sub-optimal.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.