1. Technical Field
Present invention embodiments relate to the field of database management in networked computer systems (e.g., cloud computing environments), and more particularly to data migration between databases in a networked computer environment.
2. Discussion of the Related Art
Cloud computing is a type of network-based computing that provides shared computer processing resources and data to computers and other devices on demand. A cloud computing model enables ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, bandwidth, applications and services), which can be rapidly provisioned and released with minimal management effort. Cloud computing and storage solutions provide users and enterprises with various capabilities to store and process their data in third-party data centers that may be located far from the user—ranging in distance from across a city to across the world. Sharing resources via a cloud computing infrastructure achieves coherence and economy of scale, similar to a utility (like the electricity grid) over an electricity network.
At the heart of cloud computing is an infrastructure comprising a network of interconnected processing and storage nodes. A cloud computing infrastructure is an enhancement to predecessor grid computing infrastructures through the incorporation of one or more additional abstraction layers (e.g., a cloud layer), thus making disparate devices appear to an end-consumer as a single pool of seamless resources. These resources may include such things as physical or logical computing engines, servers and devices, device memory, and storage devices, among others.
Cloud computing models promote application hosting and various data storage options. Challenges exist, however, in that many databases hosted in cloud computing infrastructures follow different data storage, access and compression paradigms. For example, columnar databases models traditionally use approximate Huffman encoding, prefix compression, and offset compression. Approximate Huffman encoding is a frequency-based compression that uses the fewest number of bits to represent the most common values. In other words, the most common values can be compressed the most. Columnar databases build column compression dictionaries as part of the initial load operation on a column-organized table. Conventional columnar database models require scanning the data twice, once for an initial analysis phase and once for the second phase, the load operation itself. The initial analysis phase may build histograms to track the frequency of data values across all columns, which consumes approximately 40% of overall data migration processing times. This means for pipes or other sources that can only be scanned once, columnar database models will create a copy of the data at the destination, a very time consuming process that requires a significant amount of disk space.
Compression in traditional database systems is well known to improve performance. Data compression dictionaries are data structures containing information by which a first data representation is transformed to a second data representation, with the second representation being smaller than the first representation (also known as the original representation). Many compression methods exist or have been proposed, including order-preserving string compression, hardware compatible data structures, data transformations, user candidate techniques, hybrid columnar tables, inverted index, and tuple map criterion. Accordingly, dictionaries created for different data structures, e.g., different structural formats of database tables, are quite different. Unfortunately, each distinct data structure cannot use the same compression dictionary.
Improving the resource utilization efficiency of data migration between different table representations, e.g., loading data from a source row based table to a destination column based table, remains an ongoing research and engineering topic in the field of databases and database management.