Massively parallel processing (MPP) database management systems scale by distributing data partitions to servers and running transactions in parallel. A single transaction can be processed in parallel on multiple servers. Such parallel processing presents challenges to transaction management, multi-version concurrency control (MVCC), and recovery.
A global transaction manager (GTM) supports atomicity consistency isolation duration (ACID) compliant transactions in an MPP database. The GTM provides a global transaction identification number (ID) to uniquely identify a transaction in the system. When a transaction involving multiple servers commits, a two-phase commit is conducted to ensure that the processing of the transaction in all the servers has been completed. The GTM also offers a global snapshot of active transactions to support MVCC, a fundamental mechanism to achieve high concurrency, enabling readers to avoid blocking writers, and writers to avoid blocking readers. In MVCC, when a database record is updated, it is not replaced by the updated record. Instead, a new version of the record is created. Both the old and new versions exist in the system, so readers and writers of the same record avoid blocking each other. They can access the right version based on the snapshot taken when a transaction or statement starts, and the transaction IDs stored in the header of the record, representing transactions performing an update. When those updating transactions, such as insert, update, and delete, commit before the snapshot is taken, their versions are visible.
Taking a snapshot and transferring it to servers for each transaction or statement causes the GTM to become a potential performance bottleneck. The visibility check using transaction IDs and transaction status log, such as Clog in PostgreSQL, is often complicated, because time information is not used to determine the occurrence of events.