The creation, storage, and use of large amounts of digital data, such as records, media files, genome information, etc., has become nearly ubiquitous in today's world of processor-based devices in common use by businesses and individuals. The trend is to outsource the storage of data to public clouds (e.g., Internet based storage area networks (SANs), network attached storage (NAS) systems, federated storage system platforms, etc.) for its cost effectiveness and superior scalability. However, the use of cloud based storage is not without risk or disadvantage. For example, instances of data breaches, such as due to inadvertent release of data or as a result of malicious attacks, are not uncommon. Much of the data stored by cloud based storage systems is sensitive in nature (e.g., comprising financial data, genomic data, multimedia data, etc.) and thus such data breaches heighten the concerns regarding the threats of breaching individuals' privacy.
Accordingly, encryption has been utilized to provide protection of the data. In particular, the data is often encrypted prior to its being stored in cloud based, or even local, storage systems to ensure data confidentiality. Such data encryption, however, typically prevents storage or other systems from providing useful computations on the stored data. For example, although encrypting the data before outsourcing the storage to the cloud based system generally renders the data useless if it is stolen by attackers or inadvertently released, encryption of the data also typically prevents cloud based storage systems from performing many different kinds of useful computations and operations on the data.
Some prior attempts have been made to provide a level of operations on encrypted data as may be stored by cloud based storage systems. For example, systems designed for keyword query over encrypted data (e.g., that can handle some SQL queries directly over the encrypted data records) have been provided by the CryptDB system developed by MIT Lincoln Laboratory (see Raluca Ada Popa, Catherine Redfield, Nickolai Zeldovich, and Hari Balakrishnan, Cryptdb: protecting confidentiality with encrypted query processing. In Proc. of ACM SOSP. ACM, 2011), Encrypted BigQuery developed by Google, and an encrypted cloud database system developed by SAP (see Florian Kerschbaum, Searching over encrypted data in cloud systems. In Proc. of ACM SACMAT. ACM, 2013). However, these prior attempts do not address similarity join query processing and instead focus on data with ordinary forms like texts and numbers.