1. Field of the Invention
This invention relates to means for determining variants of an expected nucleic acid sequence in the population formed when a substrate nucleic acid sequence is replicated, transcribed, edited, or transformed in similar ways. More particularly, the invention relates to the determination of the formation of expected sequences or variants upon such transformations to establish a better understanding and utilization of the role of such variants in genetic processes.
2. Background Art
All life forms have specific genomes that are based either on DNA or RNA (referred to collectively herein as nucleic acids). During cell processing, nucleic acids derived from the genome or other sources are frequently and routinely copied (replicated), transcribed, edited (in the case of eukaryotic organisms), and eventually translated into proteins. During such processing, there is normally an “expected” nucleic acid sequence that results from “normal” replication, transcription or editing, but frequently one or more variants of the expected sequence are formed either in addition to the expected sequence or in place of it. The formation of such variants can result from “errors” in the replication, transcription or editing processes, or may possibly result from normal cell processes, e.g. when genes overlap.
Knowledge of the formation of one or more variants of an expected sequence when nucleic acid transformations take place can provide useful information for scientists in various fields. For example, the formation of sequence variants may be indicative of disease in a particular individual or may help to explain a particular cellular process.
For example, in higher life forms (based on eukaryotic cells), including humans, the primary RNA transcript formed directly from a genetic DNA sequence undergoes editing by the cell before it is used as a template for protein formation (translation). During this process, non-coding regions of the gene (introns) are removed and coding regions (exons) are spliced together to form a functional mRNA template used for protein synthesis. Often, in nature, this editing process leads to the formation of different versions of mRNA (often referred to as “splice variants”). These different versions or variants may differ in the number and/or order of the exons incorporated into the mRNA, as well as the variation of nucleotides across the junction or splice points of adjacent exons.
It is estimated that more than 70% of gene expression events encounter splice variations, thus resulting in variations in expressed proteins, which undermines the established concept that one gene leads to only one version of the protein. The presence or level of specific splice variants may be the cause or an indicator of a disease, disorder, pathological condition or normal condition. Understanding the distribution of splice variants in various tissues is extremely important for understanding the physiological function of genes and for targeting pharmaceuticals in drug discovery, drug evaluation, as well as for diagnostic purposes.
The formation of variants in this way is not limited to the translation of DNA to mRNA in eukaryotic cells. Even in prokaryotic cells (where there are no introns and thus no editing of RNA transcripts), there may be minor variations (mutations) in some of the transcripts. Moreover, during cell division in both prokaryotic and eukaryotic cells, DNA to DNA copies are made, involving a number of enzymes (e.g RNA primase, DNA polymerase, exonuclease, ligase, etc.). Although these steps have built-in proof-reading mechanisms, such replication may result in variations in the copies of DNA formed leading to various abnormalities or diseases (Ref. 1, see References at the end of this disclosure).
A common way of determining the transcription products of cells of a tissue is to employ a DNA micro-array (often called a “gene chip”). In this method, short probes (each representing a gene) are printed on and secured to a slide or chip. The cDNA of the tested tissue, fluorescently labeled, is hybridized to the micro-array. The level of the resulting signal corresponds to the relative expression level of the associated nucleic segment in the population. However, such micro-arrays do not account for the variants among the population formed that carry such nucleic acid segments. Hence it is not the presence or absence of a nucleic acid segment that is important, but to determine and characterize the nucleic sequence species that carry those segments, e.g. if there is a nucleic segment (for example, an exon) that determines a drug binding site on the resultant protein, it is essential to know whether that exon is carried by a specific mRNA so that there will be proper folding of the protein resulting in specific protein structure conducive for binding of the drug.
In the recent past, computational modeling has been adopted as an alternative method for predicting protein structure, which is based on the amino acid sequences (primary structures) of proteins. Technical limitations of determining an amino acid sequence directly from the protein have made it necessary to predict the amino acid sequence from the nucleotide sequence of the mRNA template. The existence of splice variants formed during gene expression interferes with such predictions and has created two main problems for accurate prediction of protein structures; (a) exon composition in various mRNA transcripts formed (b) errors in splicing itself at the exon/exon junction of mRNA templates.
Presently, there is no method available to accurately determine all of the different transcripts that may be simultaneously expressed from a gene.
A method of determining all the sequence variants resulting from nucleic acid transformations would therefore be of practical significance.