Data deduplication involves analyzing a dataset or file to identify and remove redundant data. Removing redundant data saves storage space and can make subsequent data processing more efficient and less resource intense. The data deduplication process itself, however, can be resource intensive. Data can, for example, be conventionally deduplicated using a bloom filter. With a bloom filter, several hash operations are performed on an identifier associated with each received data item. Multiple hash operations are necessary to prevent collisions, resulting in a large amount of storage space. Additionally, despite the reduction in collisions provided by performing multiple hashes, collisions still occur, resulting in false positives (and therefore lost data) in the deduplication process.