Run-length encoding and bitmap encoding are popular compression schemes that are used in column stores to compress attribute values. These compression schemes are sensitive to the tuple ordering and require support for efficiently updating tuples at given offsets. A column store relation with n tuples can be updated in O(n) time. While techniques have been proposed that amortize the update cost by buffering updates in a differential store, such techniques use a table scan to apply the updates. Further, these techniques require the original relation to be decompressed and subsequently re-compressed after applying the updates thus leading to added time complexity.
Since Stonebraker et al.'s seminal paper (M. Stonebraker, D. J. et al. C-store: a column-oriented dbms. In VLDB, 2005; this and all other references are herein incorporated by reference for all purposes), column stores have become a preferred platform for data warehousing and analytics. A case study in (J. Krueger et al. Enterprise application-specific data management. In EDOC, 2010) shows that present enterprise systems are best served by column stores. Column stores provide support for compression of attribute values in order to support fast reads. Data compression, however, presents a significant bottleneck for updating relations. Compression schemes such as run-length encoding and bitmap encoding are sensitive to the ordering of tuples as has been shown in (D. Lemire et al. Sorting improves word-aligned bitmap indexes. CoRR, 2009). Hence, in order to incrementally maintain relations while achieving good compression, column stores have to support offset-based or in-place update of tuples.
Prior art implementations have proposed different techniques to amortize the cost of applying updates to a column store relation in bulk. A central theme underlying certain prior art techniques is to maintain a differential store in addition to a read-optimized store. Updated tuples are buffered in the differential store and are subsequently merged with the read-optimized store using a merge scan. Although such a differential update mechanism amortizes the time to apply updates in bulk, it cannot avoid the linear cost of a merge scan.