In the era of big data, data are typically constantly moved around to maximize value. To construct an enterprise's data warehouse and business intelligence, it is common to synchronously extract data stored on various RDBMSs (relational databases such as, for example, MySQL™, Oracle™, PostgreSQL™, etc.) to an offline storage and computing platform for unified processing by, for example, Hadoop™ in an open source community, Open Data Processing Service (ODPS) by Alibaba™ Group, etc., as shown in FIG. 1. Also, as shown in FIG. 2, data can also be migrated between different online systems (for example, between MySQL to Oracle), as shown in FIG. 2.
As an illustrative example, a MySQL table includes 100 million lines of data. To extract certain information from such a huge volume of data, for fast synchronization, multi-threading extraction will be needed. Assume that a primary key for the information extraction is associated with a specific name, with the specific name having a value range between “aa” and “zz,” and the range is divided into three segments, as shown in FIG. 3. For multi-threading extraction of data, two segment nodes within the value range between “aa” and “zz” can be acquired. Based on the segmentation, character strings can then be generated for multithreading data extraction. Under existing technologies, the character strings (e.g., “aa” and “zz”) can be converted to minimal numbers to create a value range to be segmented. The value range can then be divided into segments of equal length to obtain the segment nodes. Based on the range segments, a plurality of extraction statements can be generated, which can then be used for multithreading data extraction.
There are certain problems with such an arrangement.
First, when converting a character string to a minimal number (e.g., using BigDecimal representation), exceptions can result, which can lead to failure in range segmentation, and the associated data extraction.
Second, to avoid exceptions in conversion operation of the character string, an adaptive algorithm can be selected in which approximate processing (for example, rounding-off) can be performed. But approximate processing can destroy the precise mapping between a character string and the minimum number, and the character string cannot be precisely converted back from the minimum number. Further, the character string to be converted from a minimum number may have a length restriction, which can also destroy the precise mapping. As a result, wrong data can be extracted due to character strings that do not correspond to the primary key strings being used for the multithreading extraction.
Moreover, it is necessary to take 65536 as a base number in the process of converting an alphabetic character string to minimal numbers. Since the value range of characters covers basically all European and American characters and most Asian characters, the minimum numbers may be mapped to non-ASCII (American Standard Code for Information Interchange) characters.
In addition, not all character strings can be converted to minimum numbers for segmentation. For example, such conversion is not suitable for an integer type/time type in the RDBMS and the like.
Accordingly, there is a need for a method and an apparatus for multithreading extraction of data from a database that can facilitate efficient, accurate and stable transmission of data between databases.