Data ingestion is bringing data into a system. The data may be from multiple data sources, may be in different formats, and the amount of data may be substantial. Backup systems, typically, ingest and re-ingest a significantly large amount of metadata into a database on a daily basis to store location information and content information for the files being backed up. Generally, the metadata contains predominantly the same content day after day, along with a small proportion of new or changed content. Repeatedly ingesting the same data is costly and unwarranted in most cases. Some traditional ingestion systems may first query the database to determine whether certain metadata is already stored in the database. Some conventional backup systems may use an image of a previous backup to identify any changes in the data before digesting the data, which may limit the system to organizing metadata on a per-backup basis or a per-client basis. A bloom filter is a probabilistic data structure that is used to test whether an element is a member of a set. False positive retrieval results are possible with bloom filters, but false negatives are not. Use of a bloom filter in an ingestion system is generally inappropriate, because it can only return a result that particular metadata “may be inside a set” or “definitely is not inside the set”. A bloom filter does not return a definitive result that an element is inside a set. Traditional backup systems, where integrity and performance are important, therefore find little value in using conventional bloom filters for the purpose of ingest optimization.