Computers are used to store and manage many types of data. Tabular data is one common form of data that computers are used to manage. Tabular data refers to any data that is logically organized into rows and columns. For example, word processing documents often include tables. The data that resides in such tables is tabular data. All data contained in any spreadsheet or spreadsheet-like structure is also tabular data. Further, all data stored in relational tables, or similar database structures, is tabular data.
Logically, tabular data resides in a table-like structure, such as a spreadsheet or relational table, which may comprise an ordered arrangement of rows and columns. However, the actual physical storage of the tabular data may take a variety of forms. For example, although the logical structure of the tabular data may be multidimensional, the tabular data may physically be stored in linear format, such as in row-major or column major format. In row-major format, column values of a row from the table-like structure are stored contiguously in persistent storage. By contrast, in column-major format, for a given column of multiple rows, column values of the column are stored contiguously.
As described in STRUCTURE OF HIERARCHICAL COMPRESSED DATA STRUCTURE FOR TABULAR DATA, a flexible and extensible structure, called a compression unit, may be used to physically store tabular data. For example, compression units may be used to store tabular data from spreadsheets, relational database tables, or tables embedded in word processing documents. Tabular data within compression units may be stored in row-major or column-major format.
Data within a compression unit may be compressed according to a variety of techniques, as described in COMPRESSION ANALYZER. For example, a compression analyzer gives users high-level control over the selection process without requiring the user to know details about the specific compression techniques that are available to the compression analyzer. Users are able to specify, for a given set of data, a “balance point” along the spectrum between “maximum performance” and “maximum compression”. The compression analyzer selects the compression technique to use on a set of data by actually testing the candidate compression techniques against samples from the set of data. After testing the candidate compression techniques against the samples, the resulting compression ratios are compared. The compression technique to use on the set of data is then selected based, in part, on the compression ratios achieved during the compression tests performed on the sample data. The selected compression techniques are then used to compress data into a compression unit.
After the tabular data has been compressed into a compression unit, the data may be modified. Described herein are techniques for improving how such modifications are performed.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.