The present invention relates to a computer-implemented method for dynamically and horizontally partitioning a table of a relational database.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Relational database systems have been the backbone of business applications for more than 20 years. They promised to provide companies with a management information system covering the core applications, including financials, sales, order fulfillment, manufacturing, and human resources, which run from planning, through business processes, to individually defined analytics. However, this goal has not been achieved. The more complex business requirements became, the more the focus was put on the so-called transactional processing part and designed the database structures accordingly. These systems are called OLTP systems (Online Transactional Processing). Analytical and financial planning applications were increasingly moved out to separate systems for more flexibility and better performance. These systems are called OLAP systems (Online Analytical Processing). In reality, parts of the planning process were even moved off to specialized applications mainly around spreadsheets.
Both systems, OLTP and OLAP, are based on the relational theory but with different technical approaches [W. H. Inmon. Building the Data Warehouse, 3rd Edition. John Wiley & Sons, Inc., New York, N.Y., USA, 2002]. For OLTP systems, tuples are arranged in rows which are stored in blocks. The blocks reside on disk and are cached in main memory in the database server. Sophisticated indexing allows fast access to single tuples, however accesses get increasingly slower as the number of requested tuples increases. For OLAP systems on the other hand, data are often organized in star schemas, where a popular optimization is to compress attributes (columns) with the help of dictionaries. After the conversion of attributes into integers, processing becomes faster. More recently, the use of column store databases for analytics has become quite popular. Dictionary compression on the database level and reading only the columns necessary to process a query speed up query processing significantly in the column store case.
The introduction of so-called data warehouses must be considered to be a compromise. The flexibility and speed gained had to be paid for with additional management of the extraction and loading of data, and controlling of the redundancy. For many years, the discussion seemed to be closed and enterprise data was split into OLTP and OLAP [C. D. French. “One Size Fits All” Database Architectures Do Not Work for DDS. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, Calif., May 22-25, 1995 [1], pages 449-450]. The OLTP is the necessary prerequisite for the OLAP. However only with the OLAP can companies understand their business and come to conclusions about how to steer and change course. When planned data and actual data are matched, business becomes transparent and decisions can be made. While centralized warehouses also handle the integration of data from many sources, it is still desirable to have OLTP and OLAP capabilities in one system which could make both components more valuable to their users.
The last 20 years, Moore's law enabled the enterprise system to grow both in functionality and volume [G. E. Moore. Cramming More Components Onto Integrated Circuits. Electronics, 38(8), 1965]. When the processor speed hit the 3 GHz level (2003) and further progress seemed to be distant, two developments helped out: unprecedented growth of main memory and massive parallelism through blade computing and multi-core CPUs [G. Koch. Discovering Multi-Core: Extending the Benefits of Moore's Law. Technology@Intel, (7), 2005]. While main memory was always welcome for e.g. caching and a large number of CPUs could be used for application servers, the databases for OLTP where not ideally suited for massive parallelism but stayed on SMP (symmetric multi processing) servers. The reasons were temporary locking of data storage segments for updates and the potential of deadlocks while updating multiple tables in parallel transactions. This is the main reason why for example R/3 from SAP ran all update transactions in a single thread and relied heavily on row level locking and super fast communication between parallel database processes (SMP). Some of the shortcomings could be overcome later by a better application design, but the separation of OLTP and OLAP remained unchallenged.
Early tests with in-memory databases of the relational type based on row storage did not show significant advantages over leading RDBMSs with equivalent memory for caching. Here the alternative idea to use column store databases for OLTP was born. Column storage was successfully used for many years in OLAP and really surged when main memory became abundant [M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A Column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, Aug. 30-Sep. 2, 2005, pages 553-564. ACM, 2005] [P. Boncz. Monet: A Next-Generation DBMS Kernel for Query-Intensive Applications. 2002. PhD Thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands].