Processing of large amounts of data may be carried out by a service provider for a service user for various reasons. For example, an owner of large quantities of healthcare data might prefer to pay for cloud storage and computing resources instead of carrying the cost of the storage and processing hardware. The service user can grant the service provider access to data in order to run algorithms that extract additional value out of data, for example to train a statistical model or a deep learning algorithm. A model or algorithm trained in this way will later be able to process working data to extract information, for example to make predictions.
Data privacy provisions require that neither the service provider nor an unauthorized person such as an eavesdropper or cyber-intruder is able to make use of the service user's data, for example with the aim of exposing confidential content. The service user also needs to be certain that no other party will be able to use the data for illicit purpose, for example to run other analytical tools on the data or to use the models trained on the service user's data to generate commercial benefits.
The established way of dealing with sensitive data such as patient records is to anonymize the data before sending it to a service provider with the aim of training and developing new analytics methods such as statistical models, prediction models or computer-assisted diagnostic tools. Often, it is not sufficient to anonymize only the patient name but it is also necessary to hide other data fields that would permit patient identification by an intruder. Such data fields may include patient contact data, age, weight, height, DNA data, medical images, laboratory values, diseases and therapy history. However, this approach creates additional problems: for example, concealing such data makes it unavailable for training and learning algorithms, so that the accuracy of an analytics models will suffer significantly.
While sensitive data can be encrypted before transferring between service user and service provider, an eavesdropper might still conceivably be able to decrypt the intercepted data and access the content. Another weak link in this setup is that the service provider must decrypt the received input data before feeding it to a model or analytics tool. At this stage, the data is vulnerable to theft by an unauthorized person at the service provider end. Furthermore, a model or tool trained on that content may be used by an unauthorized person.