As technology advances, partitioned tablespaces are widely adopted as an effective way to manage larger volumes of data.
On a partitioned tablespace, there are two common types of secondary indexes that a user may create: 1) Non-Partitioned (Secondary) Index (NPI or NPSI, and also referred to as a global index by some Database Management Systems (DBMSs)) or 2) a Data-Partitioned Secondary Index (DPSI, also referred to as a local index by some DBMSs). NPI may be described as one massive index created that spans all partitions. DPSI may be described as a group of many smaller indexes, where one index is created for each partition. Comparing the two types of secondary indexes, DPSI allows the user to manage the index and data at the partition level. Therefore, DPSI provides better data availability and better utility performance.
DPSI was first introduced to improve data availability during utility maintenance. With conventional systems, in terms of query performance, a DPSI is generally not as effective as an NPI. As a result, users can not fully adopt DPSI to take advantage of its utility benefits. Typically, customers use NPIs to maintain query performance.
The following is an example Structured Query Language (SQL) statement:
SELECT *FROMCUSTOMER C,AREACODES AWHEREC.STATE = A.STATE
With the above SQL statement, there is a 2-table join between tables CUSTOMER and AREACODES, and the join predicate is CUSTOMER.STATE=AREACODES.STATE. For this example, a DPSI is created on AREACODES.STATE. For each probing value from the outer table (CUSTOMER.STATE), processing stripes through multiple partitions of the inner table (AREACODES.STATE) to find qualified rows. Then, the striping pattern is repeated for each next probing from the outer table. Because of this striping pattern, there is no sequential pattern detection, the benefit of index look-aside is decreased, and random Input/Output (I/O) is increased, which leads to more Central Processing Unit (CPU) getpage calls. Then, the query suffers from longer response time and CPU time.
This is a random I/O problem related to a standard nested loop join. Thus, customers keep the NPIs around in order to maintain optimal query performance for joins. This limits the adoption of DPSIs for customers with large partitioned tables.
However, there are potential solutions to avoid random I/O on the inner partitioned table by using other join methods such as Hash Join or Merge-Scan (Sort-Merge) Join. However, these join techniques may require significant resources given that they require a materialization of the inner table for joining very large tables. Thus, existing hash join or merge scan join may not scale due to insufficient resources or due to reprocessing the inner table to accommodate insufficient memory (as is common with hash join).
Some systems offer partition level join operations involving hash join that support bushy-tree join permutation logic or DBMSs that provide a comprehensive hash join implementation.