With the constant development of computer and network technologies, the requirement for database technology gets increasingly high. As the scale of online transaction processing applications (e.g., online trading applications) keeps expanding and the amount of users keeps increasing, these applications generate more and more data and more and more highly concurrent transactions. As a result, the scalability becomes a major obstacle that affects the development of these systems. Bad scalability of a system will exert an adverse impact on the throughput and performance of the system.
To tackle the scalability problem, many Web-based companies employ a cost-effective, parallel database management system (hereinafter referred to as DBMS for short) (e.g., Greenplum Database) and partition the data and workload across a larger number of shared-nothing nodes (e.g., commodity servers). However, the scalability of online transaction processing (OLTP) applications on these DBDMs depends on the existence of an optimal database partition design, which defines how an application's data and workload are partitioned across nodes in a cluster, and how queries and transactions for these data are routed to multiple nodes. This in turn determines the number of transactions, especially the number of distributed transactions that access data stored on each node and how skewed the load is distributed across the cluster. Optimizing these two factors is critical to scaling complex systems. Hence, without a proper design, a DBMS will perform no better than a single-node system due to the overhead caused by issues of blocking, inter-node communication, and load balancing.