1. Field
Embodiments of the invention relate to enhancing data store backup times.
2. Description of the Related Art
Relational DataBase Management System (RDBMS) software may use a Structured Query Language (SQL) interface. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
A RDBMS uses relational techniques for storing and retrieving data in a relational database. Relational databases are computerized information storage and retrieval systems. Relational databases are organized into tables that consist of rows and columns of data. The rows may be called tuples or records or rows. A database typically has many tables, and each table typically has multiple records and multiple columns.
A table in a database can be accessed using an index. An index is an ordered set of references (e.g., pointers) to the records in the table. The index is used to access each record in the table using a key (i.e., one of the fields or attributes of the record, which corresponds to a column). The term “key” may also be referred to as “index key”. Without an index, finding a record requires a scan (e.g., linearly) of an entire table. Indexes provide an alternate technique to accessing data in a table. Users can create indexes on a table after the table is built. An index is based on one or more columns of the table.
A query may be described as a request for information from a database based on specific conditions. A query typically includes one or more predicates. A predicate may be described as an element of a search condition that expresses or implies a comparison operation (e.g., A=3).
Some databases have a tablespace architecture, where a tablespace may be described as a logical entity containing tables and indexes. In certain databases, tablespaces may be described as a data file and is used to map the logical schema of the database to disk. In a database, data rows are stored in tables, and the tables exist in the tablespaces. The tablespaces are made up of one or more containers. A container may be described as a physical unit of storage (e.g., similar to a file, a filesystem or a raw device).
In certain databases, there are two types of tablespaces: System Managed Storage (SMS) and Database Managed Storage (DMS). In certain databases, the object that is used to map the logical schema of the database to disk is called a tablespace. In certain other databases, this object is called a data file. A tablespace may be mapped to storage objects using containers.
In some databases, indexes are represented by a fully linked tree. A fully linked tree may be described as a tree in which nodes may contain a “nextNode” pointer and a “prevNode” pointer. These pointers are the pool-relative page identifies (PageIDs) of the nodes, which come immediately before or immediately after the current node on the same level. Having these extra pointers is useful in doing “lookaheads” in the binary tree (BTree) structure when prefetching. The pointers are easy to maintain and contain useful information.
FIG. 1 illustrates, in a block diagram, a prior art fully linked tree 100. In FIG. 1, a root node 110 has three branches to three intermediate nodes 110, 130, 150. Intermediate node 130 has branches to leaf nodes 132, 134, 136. Leaf node 134 has slots 138, and one of the slots points to a key and a row identifier (RID) 140. The RID has a page field that points to a data page 160 and a slot field that points to a slot in a set of slots 162 on the data page 160. The slot that is being pointed to, in turn, points to a data record 164.
FIG. 2 illustrates, in a block diagram, prior art block indexes 200. Block indexes are similar to “normal” (i.e., traditional) indexes, but block indexes have pointers to blocks instead of to individual data records (e.g., data record 164 in FIG. 1). A block may be described as a group of consecutive pages that have the same key values in all dimensions. In FIG. 2, the blocks are: East 97, East 98, North 99, South 99, and West 00. For example, the records in the West 00 block 230 are from the West region and from the year 2000. In FIG. 2, the region block index 210 and the year block index 220 point to these blocks.
A table and an associated index may be in the same tablespace or in different tablespaces. FIG. 3 illustrates, in a block diagram, a prior art table 300 and two indexes 310, 320 that are stored in one index tablespace 330.
Databases allow tablespace backup and restore. To backup a table, the tablespace containing the table is backed up. When doing tablespace backup, if the tablespace contains indexes, the indexes are included in the backup of the tablespace.
Due to differences (e.g., time and system resources) between backing up with a 100 Gigabyte (100 G) database and a 1 Terrabyte (1 T) database, it may be inefficient to backup the entire tablespace.
Assume there is a table that is 500 G in size, and the table has 4 indexes. In this example, each of the indexes contains all columns of the table with different orders for performance. For this example, the table and the four indexes are stored in one tablespace. When a backup of the table is requested, instead of just backing up the table of 500 G, the tablespace is backed up, which results in backing up the 500 G table and 4, 500 G indexes, for a total of 2.5 T of data to backup. This results in operational challenges and requires large system resource, especially for daily backup
Some systems allow selective backup. For example, in order to back up the contents of a large hard drive onto a medium that is too small to back up everything, a user can choose to back up some files and leave the rest at risk (i.e., without being backed up). This may be done using some of the same backup programs that create full backups. For example, some users choose to back up pictures, documents, and other unique files, and expect to reinstall the operating system and applications, if the hard drive crashes or is corrupted (e.g., by a virus). In a WINDOWS® operating system environment, if the user backs up the “My Documents” folder, the user will get most of the unique files. (WINDOWS is a registered trademark of Microsoft Corporation in the United States and/or other countries.)
Also, if the user backs up to a Universal Serial Bus (USB) flash drive or other small capacity medium, the user decides what files to backup. If the user collects digital photographs and music, the user may not be able to fit everything in the “My Documents” folder onto such backup media. Thus, the question then is one of how files are to be selected for back up.
Some systems provide more automated ways of selecting files to back up. For example, differential and incremental backups are automated selective backups. With such backups, there is a full backup initially. Then, after some files change, differential and incremental backups are used to back up the files that have changed since the full backup.
With reference to incremental backup, if an image backup or a full backup has been created, the operating system has marked those files as backed up. Later, the user saves new files to the hard drive or modifies old ones. These changes are not in the backup. Some systems offer the option of automatically backing up the files that have changed since the most recent backup. Once this incremental backup is complete, these new and changed files are marked as backed up, so that when the user does another incremental backup, these files will not be backed up again. Each incremental backup is relatively small and quick to make, but, to fully restore the hard drive, the user has to restore the most recent full or image backup and then follow that by restoring each of the incremental backups in the order they were recorded. The user can later shorten this list of backups to restore by making a new image or full backup.
The differential backup works like the incremental backup, except, when the smaller backup is done, the new and changed files are not marked as backed up. This means that when subsequent backups will back up all the files that are not backed up in the most recent full backup.
Incremental backups are more thorough. For example, a user can restore files that were deleted or overwritten after the last full backup, but before the most recent incremental backup. However, finding a specific file amid a collection of incremental backups can be challenging. Some backup programs offer indexes that help find specific files. Differential backups offer simpler full restores (i.e., just restore the most recent full backup and the most recent differential backup). The user may lose files, however, if the files were created after the most recent full backup and deleted or overwritten before the most recent differential backup.
In high activity On-Line Transaction Processing (OLTP) systems, incremental backups are very big as many changes are done to the database.
Thus, there is a need for enhancing data store (e.g., database) backup times.