Genetic testing is moving from detection of Single Nucleotide Polymorphisms (SNPs)—isolated individual chemical differences in the genetic code—to Whole Genome Sequencing (WGS), which records every base pair in a genetic sequence. Currently, companies are focusing on creating devices that can affordably produce whole genome sequences for individuals. It is expected that in the next three years, devices will be commercially available that can sequence an entire genome for less than $500 in less than one day. The primary industry focus today is on developing the sequencing technology, biochemistry, and first stage genomic data processing (raw data processing and base-calling statistical processing).
According to some embodiments, a method is described for performing trusted computations on human genomic or other data. The described method includes: receiving a set of genomic or other data and an executable diagnostic computer program designed to operate on genomic or other data; evaluating authenticity of the executable diagnostic computer program; evaluating authenticity of at least a portion of the set of data; and when the authenticity evaluations are satisfactory, executing the computer program upon at least a portion of the set of data. According to some embodiments, diagnostic results are generated that are useful in a medical diagnosis based on the execution of the computer program. The method can also include certifying the authenticity of the results. The evaluation of authenticity of the diagnostic computer program can include verifying a digital signature packaged with the received diagnostic computer program. Similarly, the evaluation of authenticity of the genomic or other data can include verifying a digital signature packaged with the data. According to some embodiments the method also includes maintaining privacy associated with the set of data based on one or more privacy policies.
According to some embodiments, a trusted computing system is described that includes: a secure storage system configured to store at least a portion of a set of data and a computer program for operating on the data; and a secure processing system programmed and configured to evaluate the authenticity of the computer program, to evaluate the authenticity of at least a portion of the set of data, and when the authenticity evaluations are satisfactory, to run the computer program on at least a portion of the set of data.
According to some embodiments, an executable diagnostic computer program is described that includes: a diagnostic algorithm configured to execute on at least a portion of a data set so as to generate therefrom diagnostic results (e.g., results that are useful in a medical diagnosis); and a digital signature configured to aid in demonstrating the authenticity of the executable program. According to some embodiments, the computer program can also be packaged with: metadata that describes the diagnostic algorithm, an intended use of the algorithm, and one or more precautions associated with the algorithm; technical description of inputs to the algorithm which are expected in order to generate the useful diagnostic results; and/or information describing aspects of expected output from the diagnostic algorithm.
According to some embodiments, a method of generating packaged genomic data is described that includes: receiving genomic data from a DNA-sequencing device; encrypting the received genomic data; generating a digital signature which will facilitate subsequent verification of the genomic data; and packaging the generated digital signature with the encrypted genomic data. The digital signature can be generated using a private key associated with the DNA-sequencing device and/or a private key associated with the sequencing facility.
According to some embodiments, a method of operating on one or more sets of genomic data is described that includes: securely receiving one or more sets of genomic data; associating permission information with each set of genomic data, the permission information having been specified by an owner of the genomic data; receiving an algorithm to operate on genomic data; receiving a request to run the algorithm on one or more sets of received genomic data; authenticating the request; checking permissions associated with a set of genomic data; and allowing the algorithm to access or use the set of genomic data if allowed by the permissions.
As used herein, the term “genomic data” generally refers to data expressing, representing, or derived from the entirety or a portion of a genome or genome sequence. This data may include, for example, information encoded in chemical structures such as DNA, mRNA, and proteins as well as related regulatory information such as methylation status.
As used herein the term “genome” refers to an organism's hereditary information. A genome is encoded in DNA or RNA, and may be represented as mRNA or as protein sequences derived from these nucleic acid sequences. The term “genome” can include both genes and non-coding sequences. When applied to a specific organism, the term “genome” can refer to genomic data from normal cells—including mitochondrial DNA—and also genomic data from related cells such as tumors and other organisms of the microbiome.