Large-scale distributed data storage systems, such as an EMC Greenplum® massively parallel processing (MPP) database, may store very large volumes of data.
A full database backup takes time and a lot of storage space. In many typical use scenarios, large portions of a database may not change between backups. For example, in an EMC Greenplum® database certain types of table are configured to only have data added to them, such as append-only (AO) tables, to which new rows may be added but the pre-existing rows of which are not modified, or column-oriented (CO) tables, which is an append-only table with column orientation. A column-oriented table stores its content on disk by column rather than by row. In either case if the tables are created as partitions, since the data is only appended to the newest partition the older partitions are never modified and does not have to be backed up.
Performing an incremental backup that includes only data that has changed since the last backup can reduce the time and space required for the backup significantly, but in an MPP or other large-scale distributed database the time and cost associated with determining which data and/or metadata has changed, and which has not, could be very high.