Today, people often utilize computing devices (or systems) for a wide variety of purposes. Users can operate their computing devices to, for example, interact with one another, create content, share information, and access information. In some instances, computing devices can be used to generate hash values, or digests, for content items (e.g., strings of text and/or files, such as text files, software programs, program files, executable files, etc.). Each hash value that is generated for a content item can provide a compact digital fingerprint of the item. The digital fingerprint, or hash value, may be rendered as a hexadecimal string, for example. Some examples of hash functions include the MD6 message-digest algorithm and SHA-1.
Hash values can be used to determine whether one content item is identical to another. For example, a hash value for a file “test1.txt” may be “595f44fec1e92a71d3e9e77456ba80d1”. If a hash value generated for a different file “test2.txt” is also “595f44fec1e92a71d3e9e77456ba80d1,” then a determination can be made that the two files have identical subject matter. Under conventional approaches, however, hash values typically do not provide a measure of similarity between content items. That is, the hash values generated for two similar content items can vary greatly, thereby making similar content items difficult to identify.