The present invention relates to the field of data processing technology and more specifically to determining a data partition by using binary code matching to perform data processing.
Currently, due to the expansion of massive data applications, when using data enterprises, customers are not satisfied with data traditionally stored by a single server or hard disk. During the process of processing and analyzing massive data, data must be partitioned for storage to improve processing efficiency and to optimize resource configuration. The speed of data processing can be improved by splitting a larger data table into smaller, individual data tables or data information for storage, because only a fraction of data then needs to be scanned instead of all the data. Meanwhile, operations such as data maintenance, index building, backup and the like can be run more quickly.
Methods in the art for partitioning data mainly include horizontal partitioning and vertical partitioning from the perspective of partitioning direction. Horizontal partitioning divides a data table into multiple tables, with each table containing the same number of data columns, but fewer data rows. For example, a data table containing one million rows (containing all data for one year) may be divided into 12 smaller tables by horizontal partitioning, with each smaller table containing data information for one month in the year (with same number of columns but fewer rows). Any query requiring data for any specific month in that year may be performed only on a certain smaller table without performing full data scan on the large data table. Accordingly, vertical partitioning means that smaller tables contain the same number of data rows but fewer data columns, which also can achieve a similar effect as horizontal partitioning.
From the perspective of specific partitioning criterion, existing data partitioning includes list partitioning, hash partitioning, range partitioning, etc. List partitioning means that data partitioning is performed according to data values. For example, regional data may be partitioned so that data for an east region is divided into one data partition, and data for a west region is divided into one data partition. Range partitioning means that data partitioning is performed according to a range of data value. For example, data from January to March is divided into one partition, data from April to June is divided into one partition, and so on. It is not possible to determine data volume distributed in each range or enumerated value, list partitioning and range partitioning are prone to cause imbalance in data volume among various data partitions. Hash partitioning means that data partition is determined through a hash function value. Although hash partitioning can make division of data balanced, it is difficult to select an appropriate hash function, and hash partitioning will cause difficulty in migration of existing data.