Connections between computers are becoming ubiquitous. Therefore, interesting ways to have computers interact and participate in shared tasks are becoming more ubiquitous. One area in which computers are increasingly communicating and co-operating involves data de-duplication.
De-duplication can involve chunking, hashing, and indexing an object. When there are multiple computers involved in de-duplication, then multiple computers may be involved in the chunking, hashing, and indexing. The multiple computers may communicate raw data, chunked data, and hashes associated with chunked data, among other things.
Since different computers may have been configured differently, and since different computer configurations may have evolved over time, the multiple computers involved in a shared action may be operating according to different rules concerning how to chunk, hash, and/or index data. Since different computers may have participated in different communications and different actions, different data may be indexed and/or available at the different computers.
Both communications bandwidth and memory can be saved by not re-communicating raw data between computers that already have indexed, de-duplicated copies of the raw data. However, when the multiple computers have operated under different rules, conventionally it may have been difficult, if even possible at all, for the multiple computers to realize that they all had de-duplicated copies of the raw data. Even if the multiple computers agreed on some standard or minimal chunk size for communicating raw data, unnecessary communications may have occurred since the standard or minimal chunk size was likely not the most efficient chunk size available. Inefficient and/or mismatched chunk sizes persist in conventional systems because multiple computers conventionally have not negotiated an efficient and/or matching chunk size. Additionally, even if the multiple computers agreed on communicating some mixture of raw data and hashes associated with raw data to transmit, unnecessary communications may still have occurred once again due to efficiencies and consistencies not being negotiated.