A virtual machine in a virtual computing infrastructure can run on a host device that comprises physical hardware and virtualization software. One or more applications that can run within the virtual machine can generate data that may be stored on one or more virtual disks. In many cases, when backing up a virtual machine, it is desirable to parse the list of the changed files to build a catalog of changed files. This is true even if the backup of the virtual machine is a full virtual machine backup, backing up the virtual machine at the block level. However, parsing a file system with a large plurality of files requires substantial computing resources such as processor time and storage cycles (input/outputs, or I/Os). One option is to have a driver inside the virtual machine which tracks the changed files but customers usually do not like to install such drivers.
In addition, even if a storage appliance is designed to support snapshot-based backups, a substantial amount of I/O's are used in generating the snapshots and in traversing a file system to determine file associations and structure for the changed blocks in the snapshots, this is especially true for virtual machine snapshots in a virtual machine file system (VMFS) because some storage arrays do support performing snapshots, but usually not at a virtual machine level. The substantial number of disk I/O's required to perform snapshot-based backups, if a storage appliance supports them at all, can negatively impact performance of the storage appliance. Further, the longer a snapshot backup takes, the less often a snapshot-based backup can be performed.