Generating snapshots of a distributed database may be difficult due, in part, to the database not being strongly consistent across the various nodes of the distributed database. That is, at any one time, data changes on one or more of the nodes may not be fully synchronized with other nodes and are therefore inconsistent with those other nodes. Additionally, snapshots are difficult since it is impossible to capture the states of all nodes at exactly the same time without freezing data changes on the nodes while the snapshot is generated. It is not practicable to freeze large databases for the amount of time needed to generate a snapshot. Moreover, in distributed database, each data usually has multiple copies. To improve the space utilization, the snapshot should get rid of the redundancy and contain only one piece of the data. Therefore, to generate a consistent deduplicated snapshot, each node is typically scanned multiple times to ensure consistency, which involves a relatively large amount of time and processing power.