A virtual machine that is comprised of a plurality of content files may be ingested and backed up to a storage system. The storage system may create an index of the content files. The virtual machine may be backed up a plurality of times to the storage system and the storage system is configured to store the different versions of the virtual machine. The different versions of the virtual machine may include different versions of the content files. To determine which files have changed between the virtual machine versions, conventional systems read the entire contents of a first and second version of a virtual machine, and determine the differences between the virtual machine versions. This is a time consuming and resource intensive process because a virtual machine may be comprised of a large amount of data (e.g., 100 TB).
Other systems read through the metadata associated with a virtual machine. The metadata associated with a content file of the virtual machine may include a timestamp. The timestamp may be compared with timestamps associated with virtual machine versions to determine when the content file was modified. The metadata associated with a virtual machine volume may comprise approximately five percent of the virtual machine volume. For large virtual machine volumes, going through the metadata to determine which content files have changed based on a timestamp associated with a content file is still a time consuming and resource intensive process.