CPC G06F 16/278 (2019.01) | 20 Claims |
1. A computer-implemented method for organizing data into segments to improve read performance in a distributed database system having at least a master node and a slave node, the slave node including a data node that hosts a plurality of tablets, the plurality of tablets being a part of a large distributed table managed by the master node, the method comprising:
instantiating corresponding components of a compaction engine respectively on the master node and on the slave node, wherein the master and the slave nodes are physically separated but communicatively coupled together via a network;
determining, by the compaction engine, a height of a tablet within a keyspace of the tablet based on a number of rowsets, in a plurality of rowsets included in the tablet, that have key ranges that overlap;
determining, by the compaction engine, a rowset width of each rowset in the keyspace of the tablet based on a percentage of the keyspace to which the rowset corresponds;
until a minimum operational cost is reached, iteratively calculating, by the compaction engine, an operational cost associated with compaction of two or more rowsets in the keyspace based on the height of the tablet and the rowset widths of the two or more rowsets;
selecting, by the compaction engine, two or more particular rowsets for compaction based on the two or more particular rowsets resulting in the minimum operational cost; and
performing, by the compaction engine through communicating instructions between the master node and the slave node over the network, a compaction of the two or more particular rowsets, the performed compaction resulting in a merger of the two or more particular rowsets,
wherein the compaction results in one or more of: reduction in the height of the tablet, removal of overlapping rowsets, and/or creation of smaller sized rowsets.
|