Various types of databases are used to store and retrieve data. For example, a relational database is organized as tables with each table capable of having zero or more rows of data. One or more columns in a table can be designated as a primary key (e.g., a unique value for the row). The data in one table can have a relation to another table by relating a column in one table to one of the primary key columns in a second table. Each table can also have one or more associated indexes.
A database index is a subset of a table, such as the primary key, that can be used to search for the existence of a row rather than searching the entire table. At the storage device level, pages are used to store data of a database. For example, rows of a database are stored as data pages of fixed size (e.g., each table can have multiple data pages) whereas indexes can be stored as index pages. Groups of pages are called extents.
A relational database management system (RDBMS) can include software that can be used to manage a relational database. Typically, Structured Query Language (SQL) is the programming language that is used to create, read, update, and delete (CRUD) data stored in the tables of a database. An SQL command can be considered a query.
One common task in data warehouse environments using relational databases is large-scale loading and merging of data across tables. The data can be loaded in the data warehouse through SQL queries such as INSERT INTO . . . SELECT or MERGE INTO statements. In various examples, while loading the data, the indexes defined on the target table of these SQL statements should be updated at the same time in order to maintain the data integrity and keep the indexes useful in subsequent queries. However, existing approaches insert the rows one-by-one and update the indexes for each row in a non-BULK insert mode (e.g., the indexes are updated serially). For large data loading, using serial inserts and serial index updates can be an inefficient approach.