Privacy control can be defined as the ability of individuals to determine when, how, and to what extent information about themselves is revealed to others. In this way, the usage of private data will remain in context and it will be used exclusively for the purpose the data owner has in mind. Privacy is usually protected by both legal and technological means. By using legal actions, such as data protection directives and fair information practices, privacy regulations can enforce privacy protection on a large scale. Yet, this approach is mostly reactive, as it defines regulations after technologies are put in place. To avoid this issue, Privacy-Enhancing Technologies (PETs) [1-3] can be incorporated into the design of new systems in order to protect individuals' data. PETs protect privacy by eliminating or obfuscating personal data, thereby preventing misuse or involuntary loss of data, without affecting the functionality of the information system [4].
Their objective is to make it difficult for a malicious entity to link information to specific users. In order to obfuscate personal data, PETs often rely on cryptographic primitives, such as anonymous authentication and encryption.
Genomics is becoming the next significant challenge for privacy. The price of a complete genome profile has plummeted below $200 for genome-wide genotyping (i.e., the characterization of about one million common genetic variants), which is offered by a number of companies (located mostly in the US). Whole genome sequencing is also offered through the same direct-to-consumer model (but at a higher price). This low cost of DNA sequencing will break the physician/patient connection, because private citizens (from anywhere in the world) can have their genome sequenced without involving their family doctor. This can open the door to all kinds of abuse, not yet fully understood.
As a result of the rapid evolution in genomic research, substantial progress is expected in terms of improved diagnosis and better preventive medicine. However, the impact on privacy is unprecedented, because (i) genetic diseases can be unveiled, (ii) the propensity to develop specific diseases (such as Alzheimer's) can be revealed, (iii) a volunteer accepting de facto to have his genomic code made public (as it already happened) can leak substantial information about his ethnic heritage and genomic data of his relatives (possibly against their will), and (iv) complex privacy issues can arise if DNA analysis is used for criminal investigations and insurance purposes. Such issues could lead to genetic discrimination (e.g., ancestry discrimination or discrimination due to geographic mapping of people). Even though the Genetic Information Non-discrimination Act (GINA), which prohibits the use of genomic information in health insurance and employment, attempted to solve some of these problems in the US, these types of laws are very difficult to enforce.
An even more severe case, currently not widely considered, is where a malicious party initiates a cross-layer attack by utilizing privacy-sensitive information belonging to a person retrieved from different sources (e.g., genomic data, location, online social network, etc.), thus creating the opportunity for a large variety of fraudulent uses of such data. For example, as stated in the Personal Genome Project (PGP) consent form [5], a malicious party could make synthetic DNA of a person and plant it at a crime scene to falsely accuse him.
In this hypothetical situation, the attacker can make his accusation stronger if he has the location patterns of the person to be blamed, and hence, knows that the person was close to the crime scene at the time of the crime. Similarly, an attacker can easily obtain information on close relatives of a target from online social network data, thus effectively increasing the potential access to target's genomic data if his relatives' DNA has been sequenced. In other words, even if the person has perfect privacy on his own genome, if the attacker has access to the DNA sequence of the relatives, he can obtain significant information about the person's DNA sequence.
Even though, at this stage, the field of genomics is generally free from serious attacks, it is likely that the above threats will become more serious as the number of sequenced individuals becomes larger. Such was the case of the Internet that was initially run and used by well-intentioned researchers. However, once it became more widely used, it became plagued by uncountable attacks such as spyware, viruses, spam, botnets, Denial-of-Service attacks, etc. Therefore, the need to adapt PETs to personal genomic data will only grow with time, as they are key tools for preventing an adversary from linking particular genomic data to a specific person or from inferring privacy-sensitive genomic data about a person.
It is obvious that users need to reveal personal and even privacy-sensitive information for genomic tests (e.g., paternity tests, disease-susceptibility tests, etc.). Nevertheless, they want to control how this information is used by the service providers (e.g., medical units such as healthcare centers or pharmaceutical companies, depending on the type of the test). Currently, the companies and hospitals that perform DNA sequencing store the genomic data of their customers and patients. Of course, tight legislation regulates their activities, but it is extremely difficult for them to protect themselves against the misdeeds of a hacker or a disgruntled employee. In a non-adversarial scenario, however, making use of this data requires legitimate professionals (e.g., physicians and pharmacists) to access the data in some way. Therefore, new architectures and protocols are needed to store and process this privacy-sensitive genomic data, while still enabling its utilization by the service providers (e.g., medical units).
In this work, our goal is to protect the privacy of users' genomic data while enabling medical units to access the genomic data in order to conduct medical tests or develop personalized medicine methods. In a medical test, a medical unit checks for different health risks (e.g., disease susceptibilities) of a user by using specific parts of his genome. Similarly, to provide personalized medicine, a pharmaceutical company tests the compatibility of a user on a particular medicine, or a pharmacist checks the compatibility of a given medicine (e.g., over-the-counter drug) to a given user. In both scenarios, in order to preserve his privacy, the user does not want to reveal his complete genome to the medical unit or to the pharmaceutical company. Moreover, in some scenarios, it is the pharmaceutical companies who do not want to reveal the genetic properties of their drugs. To achieve these goals, we propose to store the genomic data at a Storage and Processing Unit (SPU) and conduct the computations on genomic data utilizing homomorphic encryption and proxy encryption to preserve the privacy of the genomic data.
The rest of the paper is organized as follows. In the rest of this section, we discuss the challenges in genomic privacy and summarize the related work on genomic privacy. In Section 2, we describe our proposed schemes for privacy-preserving medical tests and personalized medicine. Furthermore, we analyze the level of privacy provided by the proposed schemes for different design and genomic criteria. Then, in Section 3, we discuss the implementation of the proposed schemes and present their complexity and security evaluations.
Finally, in Section 4, we conclude the paper and discuss new research directions on genomic privacy.