Computing devices typically use advanced algorithms to represent data objects (e.g., images, audio files, text documents, etc.), as vectors. These vectors include multiple dimensions that each represent a feature of the data object. One use for these vectors is identifying matching or similar data objects. For example, distance functions are used to identify vectors that are closest to a target vector representing a target data object (e.g., k-nearest neighbors). The nearby vectors indicate that the corresponding data objects either match or are similar to the target data object. While effective, these methods are resource intensive for large data sets.
Current improvements include converting floating values in the vectors to binary values, thereby reducing the size and complexity of the vectors. A hamming distance between the converted vectors is determined to identify similar vectors. The hamming distance indicates the number of positions that differ between two binary strings. A subset of vectors that have a hamming distance below a threshold are identified as candidate vectors. The system then uses distance functions on this smaller subset of candidate vectors, thereby reducing resource usage.
While these methods represent improvements, an additional reduction in resource usage is desirable. Accordingly, additional technical improvements are needed.