Although there are a number of cancer therapies known and in development to treat various forms of cancer, it is difficult or impossible to predict which cancer therapies (including cancer drugs) will be effective in treating a particular cancer type of a particular patient. In the last decade there has been a rapid increase in evidence showing the effects of one or a few markers for the treatment of various cancers. Unfortunately, this information has been exceedingly difficult to generalize between patients, particularly to patients not possessing the specific value for the markers examined.
Further, there is a confusing, and sometimes conflicting, variety of markers and marker values relevant to patient drug response. Many markers are DNA based. Deoxyribonucleic acids (DNA) are the building blocks of the genome. Human genome has about 3 billion base pairs organized into 22 pairs of autosomes (1 through 22) and a pair of sex determining chromosomes X and Y. There are 4 DNA bases Adenine, Guanine, Thymine, Cytosine. A series of DNA organized in a specific fashion form a gene. Each gene is associated with one or more traits of the organism such as color of eye, height, etc. Only about 2% of the human genome encodes for around 23000 genes. Rest of the 98% of the genome is not well understood and is currently considered as “junk DNA”. The variation in the genome sequence between any two individuals is expected to be less than 0.01%. This small variation at various positions across the human genome is believed to account for all the visible differences among individuals as well as plays a role in health, disease and aging. Some genes are more critical to the normal functioning of the human than others. Currently about 4800 genes are known to have clinical relevance. Some DNA base positions are so critical that even a single DNA base modification or substitution (called a single nucleotide variation or SNV) can cause disease with one or more manifestations. For e.g., CFTR gene is well established to be associated with Cystic Fibrosis with many known mutations. Many mutations are also well established in BRCA1 and BRCA2 genes to be associated with breast cancer, etc. Such SNVs are called Mutations. Variations can also span more than one base position such as multi-base insertions or deletions (indels) and translocations of large regions of the genome. Such changes are called structural variations. These also include copy number variations or CNVs which occur when the number of copies of a region of the human genome deviates from its normal number 2 (for diploid). Any of these variations occurring in clinically sensitive regions of the genome can cause diseases of varying severity depending on the function of the genes involved.
Genetic markers, whose values may be identified by sequencing, such as next-generation sequencing, have been proposed as helping identify therapy response in cancer patients. Next-Generation Sequencing (NGS) refers to the recent advances in sequencing the deoxyribonucleic acids (DNA) in a massively parallel manner faster and cheaper without loss of accuracy compared to earlier methods such as Sanger sequencing. While the cost of sequencing the first complete human genome took as 13 years and cost about 3 billion USD at a rate of $1 per base in 2000, many laboratories around the world can now apply NGS to sequence human genomes routinely for about $1000 (or $0.000003 per base sequenced) per genome. NGS technology is agnostic of the origin of the DNA (i.e., source organism). There are a number of vendors of NGS technology with Illumina leading the market with several variant products including X10, HiSeq, Miseq and NextSeq. Life technologies Ion torrent platform is the second largest player. Various smaller players are attempting to break into this market.
Since the genome of every individual in this world is unique, there is no such thing called “normal genome” or “standard genome”. However, in order to serve as a reference, free and open reference genome databases are made available to the scientific research community (academic and commercial) by National Institutes of Health in USA. The human genome assembly builds (the stable hg19 build and the latest hg20 build) and annotations are constantly upgraded and also enhanced by external groups marking their own annotations to the genome builds released by NIH.
What is needed are rapid and accurate methods and apparatuses for determining which therapies may be effective (and/or ineffective) in treating a particular patient.
Described herein are methods apparatuses that may provide a caregiver (e.g. physician, nurse, etc.) to treat a cancer patient by predicting which drugs may be effective in treating that patient. As will be described in greater detail below, these methods and apparatuses may improve traditional clinical genomics, and may combine them with other marker information and provide powerful and accurate predictions for patient therapy.
Clinical genomics is the application of NGS and other genomic technologies for clinical utility such as diagnosis of a disease at the molecular (DNA, RNA, or Protein) level. Most currently published clinical genomics methods involve one or more of the following steps: identifying genes or regions of the genome of interest to a particular disease or group of diseases. (Target regions selection); designing a method to capture only the target DNA regions of interest. (Target capture); amplification of the target DNA capture by polymerase chain reaction (PCR); prepare libraries for NGS; NGS and generation of millions of short reads of same length (around 75 to 150 bases each); aligning the short reads coming out of NGS to the human reference genome hg19 allowing for a reasonable number of mismatches to account for SNVs and small indels. Many open source or commercial algorithms and tools are available to perform this step of the analysis. From the aligned regions of the genome, calling all SNVs, indels and CNVs by comparing against the reference genome databases. Many algorithms and tools are available to perform this step of the analysis. Annotation and interpretation of the variations called in the previous step may also be performed. This integrates information from literature curated databases of variant-disease associations for previously published variants. For discovery of novel (previously unknown) clinically relevant variants, various tools are used to predict the possible functional/clinical impact of variants. From these steps, it would be beneficial to generate clear, concise and precise clinical reports that are readily interpretable by physicians to make clinical decisions. An example of a workflow for clinical genomics based diagnostics are illustrated in FIGS. 1A-1B. Additional descriptions and examples are provided below.