Tar archives are a common way of collecting multiple data files into a single file for storage, distribution, and the like. To ensure that a tar archive has not suffered corruption or been tampered with during storage or transmission, it is common to utilize a verification function that reads the tar archive and generates original verification data based on the contents of the tar archive. An example of verification data is a checksum. The tar archive may then subsequently be distributed from the originator to a first downstream entity, along with the original verification data. The first downstream entity can receive the tar archive and then run the same verification function against the received tar archive to generate new verification data that can be compared to the original verification data. If the new verification data does not match the original verification data, it may be assumed that the tar archive that was received differs in some manner from the original tar archive, and the received tar archive may be rejected. If the new verification data does match the original verification data, the data files and raw data in the tar archive may be extracted, stored in a storage device, and then utilized as appropriate.
Subsequently, the first downstream entity may send the tar archive to a second downstream entity. The first downstream entity may either send the original copy of the tar archive received from the originator to the second downstream entity, or, may create a new tar archive from the extracted files of the original tar archive. Unfortunately, even in a situation where the extracted files may not have changed, a newly generated tar archive may not bit-for-bit match the original tar archive and thus may fail a verification test. This is because one or more pieces of information maintained in a tar archive may change over time, such as ownership of a file, access time of a file, or the like. Additionally, different tar archive generation utilities may differ slightly in exactly how they generate a tar archive, resulting in slightly different tar archives. Thus, if the first downstream entity generates a new tar archive and sends the new tar archive to the second downstream entity, the tar archive received by the second downstream entity may not bit-for-bit match the original tar archive, and thus the verification match will fail.
To eliminate this problem, an entity may retain copies of any tar archives that may need to be subsequently redistributed. Because the files in the tar archive must be extracted from the tar archive and stored on a storage device for use, the data is duplicated: the data exists in the copy of the tar archive, and the data exists on a storage device as separate files. This duplication of data wastes computer storage and may increase the processing requirements of other computer processes, such as a backup process that backs up the storage device.