The present (circa 2015) price of sequencing an individual's genome may have dropped dramatically. Such individual genome sequencing may open a new era of genome-wide association studies (GWAS) (as well as other less than full genome genetic studies) based on a plurality of such individual's sequenced genomes (or portions thereof), along with their associated information (e.g., medical records) stored in various databases. Such individual genome sequencing may open a new era of personalized medicine, in which preventive and/or therapeutic interventions for complex diseases may be tailored to individuals based on their specific and particular genetic information.
However, because of a wealth of information that may be learned from or that inherently associates with the individual's own sequenced genome, handling of such individual genome sequence information may carry inherent risks of certain abuses. For example, the individual's genome sequence information itself may act as a unique “fingerprint” allowing the individual to be identified from their own genome sequence information. Thus, the handling of the individual's genome sequence information may provide opportunities for privacy breaches and/or intrusion into the individual's privacy. Some countries and/or states, by law, mandate that such sensitive and identifying information be managed, stored, transmitted, disclosed, published, processed, handled, and the like in particular manners that mitigate against such privacy abuses. For example, in the United States, there is a federal law known as the Health Insurance Portability and Accountability Act (HIPAA). HIPAA may establish standards for privacy and security of health information, as well as, standards for electronic data interchange (EDI) of such health information. HIPAA may specify a list of 18 identifiers as Protected Health Information (PHI) that must be encrypted by law, and must be stored only in encrypted form, and transmitted only through secure means. Biometric identifiers may be included in this list of 18 identifiers. Commonly, biometric identifiers may comprise an individual's fingerprints. Biometric identifiers may also comprise an individual's DNA sequences. For example, an example of using the individual's DNA sequences to identify the individual may be depicted in the FIG. 2 and FIG. 3 figures as well as discussed in the disclosure discussing those figures. HIPAA has specified two different de-identification techniques to minimize re-identification of a given individual. In particular, a safe harbor method may require removal of all 18 identifiers so no actual knowledge, including possibly residual information, can identify an individual. Since the biometric identifier, one of the 18 identifiers, has to be removed, the safe harbor method is not suitable for genetic studies, such as, but not limited to, GWAS. On the other hand, an expert determination method may apply statistical, mathematical, and/or scientific principles such that treated health information may carry an appropriately very small risk to re-identify an individual. This may comprise various data cleansing and/or anonymizing methods to minimized re-identification of any given individual. One example may be of anonymizing a geographic location like an individual's address to retain only the state of the address before transmission of such address information to others. However, HIPAA does not provide for explicit nor specific instructions for anonymizing the biometric identifier. Note, other nations, states, and/or regions may have laws similar to HIPAA, that may require certain results must be achieved when dealing with biometric identifiers in order to protect individuals privacy and minimize the potential for genetic abuse and/or genetic discrimination.
There then is a need, by law, and from the individual's perspective, for methods and/or systems for one or more of: managing, storing, transmitting, disclosing, publishing, processing, handling, and/or the like of genome sequence information such that an ability to learn the individual's identify is minimized or mitigated against.
In another example, the individual's genome sequence information may provide a means to associate various predispositions and/or active phenotypes in that individual. And others (e.g., third parties, like employers, insurance carriers, educational institutions, and/or the like) may use such information to discriminate against the individual. For example, such discrimination could be in the employment context and/or in a context admission into various programs, schools, insurance coverage and/or the like. There then is a need to prevent, minimize, and/or mitigate against such discrimination.
U.S. Pat. No. 8,019,620 issued to Miller et al. teaches an integrated platform for privacy management of electronic medical records, encompassing the entire life cycle of privacy management. U.S. Pat. No. 8,326,849 issued to El Emam et al. teaches a method, system and computer memory for optimally de-identifying a dataset of medical records where a lattice of information may be determined to define the anonymization strategies. U.S. Pat. No. 7,823,207 issued to Evenhaim teaches a privacy preserving data-mining protocol for querying privacysensitive micro-data. However, these platforms, systems, methods, and/or protocols were not specifically designed to manage nor process genome sequences and associated information. These platforms, systems, methods, and/or protocols are very poorly equipped to manage and/or process genome sequences and associated information in a way to achieve the desired goals. With about 3 million base pairs per individual of their own genome, portions of which may be of varying degrees of uniqueness, very specific methods and/or systems must be used to achieve the desired goals of preventing, minimizing, and/or mitigating against identification of a given individual; and/or of preventing, minimizing, and/or mitigating against discrimination.
U.S. Pat. No. 8,589,437 issued to Khomenko et al. teaches a system for separating identifying data from personal data in which a set of mapping data is introduced to associate a first set of stored identifying data such as account data and a second set of stored personal data such as phenotype data and genotype data. U.S. Pat. No. 8,600,683 issued to George teaches methods and systems for obtaining, processing, and managing sequence data in which a unique identifier is used to store the original sequence in one database and the same unique identifier is used to index information for identifying the source of the sequence in another database. However, how to further separate and associate the genotype data has never been taught in these systems and methods.
There is a need in the art for methods and/or systems for processing genome sequence information and associated information in a manner that achieves the desired goals of preventing, minimizing, and/or mitigating against identification of a given individual; and/or of preventing, minimizing, and/or mitigating against discrimination. Accordingly, methods and/or system for anonymizing at least a portion of a given genome sequence and/or at least a portion of associated information are required.
It is to these ends that the present invention has been developed.