Many signal processing, machine learning and data mining applications require comparing signals to determine how similar the signals are, according to some similarity, or distance metric. In many of these applications, the comparisons are used to determine which of the signals in a cluster of signals is most similar to a query signal.
A number of nearest neighbor search (NNS) methods are known that use distance measures. The NNS, also known as a proximity search, or a similarity search, determines the nearest data in metric spaces. For a set S of data (cluster) in a metric space M, and a query q∈M, the search determines the nearest data s in the set S to the query q.
In some applications, the search is performed using secure multi-party computation (SMC). SMC enables multiple parties, e.g., a server computes a function of input signals from one or more client to produce output signals for the client(s), while the inputs and outputs are privately known only at the client. In addition, the processes and data used by the server remain private at the server. Hence, SMC is secure in the sense that neither the client nor the server can learn anything from each other's private data and processes. Hence, hereinafter secure means that only the owner of data used for multi-party computation knows what the data and the processes applied to the data are.
In those applications, it is necessary to compare the signals with manageable computational complexity at the server, as well as a low communication overhead between the client and the server. The difficulty of the NNS is increased when there are privacy constraints, i.e., when one or more of the parties do not want to share the signals, data or methodology related to the search with other parties.
With the advent of social networking, Internet based storage of user data, and cloud computing, privacy-preserving computation has increased in importance. To satisfy the privacy constraints, while still allowing similarity determinations for example, the data of one or more parties are typically encrypted using additively homomorphic cryptosystems.
One method performs the NNS without revealing the client's query to the server, and the server does not reveal its database, other than the data in the k-nearest neighbor set. The distance determination is performed in an encrypted domain. Therefore, the computational complexity of that method is quadratic in the number of data items, which is significant because of the encryption of the input and decryption of the output is required A pruning technique can be used to reduce the number of distance determinations and obtain linear computational and communication complexity, but the protocol overhead is still prohibitive due to processing and transmission of encrypted data.
Therefore, it is desired to reduce the complexity of performing hashing computations, while still ensuring the privacy of all parties involved in the process.
The related application Ser. No. 12/861,923, describes a method that uses non-monotonic quantizers for hierarchical signal quantization and locality sensitive hashing. To enable the hierarchical operation, relatively larger values of a sensitivity parameter A enable coarse accuracy operations on a larger range of input signals, while relatively small values of parameter enable fine accuracy operations on similar input signals. Therefore, the sensitivity parameter decreases for each iteration.
As described therein, the most important parameter to select is the sensitivity parameter. This parameter controls how the hashes distinguish signals from each other. If a distance measure between pairs of signals is considered, (the smaller the distance, the more similar the signals are), then Δ determines how sensitive the hash is to distance changes. Specifically, for small Δ, the hash is sensitive to similarity changes when the signals are very similar, but not sensitive to similarity changes for signals that are dissimilar. As Δ becomes larger, the hash becomes more sensitive to signals that are not as similar, but loses some of the sensitivity for signals that are similar. This property is used to construct a hierarchical hash of the signal, where the first few hash coefficients are constructed with a larger value for Δ, and the value of Δ is decreased for the subsequent values. Specifically, using a large Δ to compute the first few hash values allows for a computationally simple rough signal reconstruction or a rough distance estimation, which provides information even for distant signals. Subsequent hash values obtained with smaller Δ can then be used to refine the signal reconstruction or refine the distance information for signals that are more similar.
That method is useful for hierarchical signal quantization. However, that method does not preserve privacy.