The references to be discussed throughout this document are set forth solely for the information described therein prior to the filing date of this document, and nothing herein is to be construed as an admission, either express or implied, that the references are "prior art" or that the inventor is not entitled to antedate such descriptions by virtue of prior invention or priority based on earlier filed applications.
I. Introduction
The amplification of desired portions or entire sequences of DNA and RNA finds utility in a variety of fields, from criminal investigations (where DNA obtained from crime scene samples are compared with the DNA from an accused individual), to archeology (where the DNA of ancient plants, animals, sub-human species and humans are analyzed), to paternity analysis (where the DNA from the offspring and a possible parent are comparatively analyzed), to genetic analysis (where the DNA of individuals are analyzed for an indication of the possibility of genetic variation which is indicative of a particular disease state). Amplification of the nucleic acid sequence is most typically necessary because whatever DNA may be present from the source is extremely limited such that in order to properly analyze such DNA, many more copies of the original RNA are required.
The ability to amplify nucleic acid sequences is relatively recent (1985), but the impact of this ability has been phenomenal--without such amplification, most of the foregoing exemplary fields would not be possible. Thus, as the areas in which DNA amplification has expanded, the requirements placed upon various amplification techniques have changed. Accordingly, a very real and ongoing need exists for highly specific amplification techniques.
II. The Genetic Code
(a) Background Information
Deoxyribonucleic acid ("DNA") and ribonucleic acid ("RNA"), are long, thread-like macromolecules, DNA comprising a chain of deoxyribonucleotides, and RNA comprising a chain of ribonucleotides. A "nucleotide" consists of a nucleoside and one or more phosphate groups; a "nucleoside" consists of a nitrogenous base linked to a pentose sugar; a "pentose sugar" comprises five carbon atoms. In a molecule of DNA, the pentose sugar is "deoxyribose" and the nitrogenous base can be adenine ("A"), guanine ("G"), thymine ("T") or cytosine ("C"). In a molecule of RNA, the pentose sugar is "ribose", and the nitrogenous bases are the same as DNA, except uracil ("U") replaces thymine. The specific sequence of the nitrogenous bases encodes genetic information, or, the "blueprint" for life.
Double stranded DNA consists of two "complementary" strands of nucleotide chains which are held together by (relatively) weak hydrogen bonds--these bonds can be "broken" by, e.g., heating the DNA, changing the salt concentration of a fluid surrounding the DNA, or chemical manipulation; this process is referred to as "denaturation". By lowering the temperature, adjusting anew the salt concentration or removing/neutralizing the chemical, the two strands of DNA have a tendency to re-form in their approximate/identical original state. The bases of each DNA molecule selectively bind to each other: A always bonds with T, and C always bonds with G. Thus, the sequence "ATCG" of a first strand lies immediately opposite a complementary sequence "TAGC". This is referred to as "complementary base pairing" and the process of complementary base paring is referred to as "hybridization".
Three types of RNA (messenger RNA, mRNA; transfer RNA, tRNA; ribosomal RNA, rRNA) are associated with translation of the genetic information encoded in the DNA into designated amino acids, which are the building blocks for polypeptides and proteins; each of twenty naturally occurring amino acids is encoded by various groupings of three nucleotides, this grouping being referred to as a codon. Thus, the primary sequence of proteins are comprised of amino acids assembled in ribosomes based on codons defined by mRNA. Proteins are necessary to the development, maintenance and existence of living organisms; the presence, or absence, of certain proteins in different cells/tissues can be indicative of the presence, or absence of, e.g., certain biological functions of the aforementioned cells/tissues.
Genetic information is generally transferred as follows: DNA.fwdarw.RNA.fwdarw.amino acid/protein. Not every region of a DNA molecule is translated by RNA into protein; those regions that are translated are referred to as "genes." Expression of genes, therefore, serves to control the transition of hereditary characteristics by specifying the eventual proteins produced from a gene, or genes.
(b) Mutations in the Genetic Code
DNA macromolecules are chemically quite similar to each other. A and G are quite similar in chemical composition, and C, T and U are equally similar. Thus, in a specified sequence, substitutions, e.g., transitions, of an A for a G or a C for a T may occur likewise, "transversions" of an A or G for a C or T (or vice versa) may occur. When such a substitution occurs within a codon such that the amino acid encoded thereby remains the same, then the substitution can be referred to as a "silent" substitution, i.e., the nucleotides are different but the encoded amino acid is the same. However, other substitutions can alter the amino acid encoded by the codon; when the nucleotide alteration results in a chemically similar amino acid, this is referred to as a "conservative" alteration, while a chemically different amino acid resulting from the alteration is referred to as a "non-conservative" alteration. Non-conservative alterations of amino acids can result in a molecule quite unlike the original protein molecule.
A protein that has had its amino acids altered can be referred to as a "mutant", "mutation" or "variant." Mutations occur naturally and can have positive, negative or neutral consequences on the organism experiencing such a mutation. Similarly, genes that have had sections altered (e.g., by insertion or deletion of DNA sequence(s) are mutations; thus, by definition, the proteins expressed by such a mutated gene can have positive, negative or neutral consequences on the organism.
By way of example, the gene responsible for the disease cystic fibrosis (a genetically inherited disorder affecting children and young adults and which is clinically manifested by the obstruction of the airways by thick, sticky mucus and subsequent infection) comprises 250,000 nucleotides, which encode a protein of 1480 amino acids (the protein is referred to as "cystic fibrosis transmembrane conductance regulator", or "CFTR"). When this gene is compared with individuals who do not have the CF gene (i.e., individuals who have a "normal" gene), a frequent difference evidenced is that a single codon is deleted from the "normal" gene, which results in the loss of a single amino acid from the "normal" protein. However, this CFTR gene mutation accounts for only about 70% of those individuals who have CF; there are at least about 170 different CFTR gene mutations which account for the remaining 30%.
III. The Structural Formation of DNA/RNA Macromolecular Strands
While the sequence of the nitrogenous bases of the DNA and RNA macromolecule encode genetic information, the sugar and phosphate groups perform a structured role, forming the backbone of the molecule (typically, the phosphate group is attached to the fifth carbon , "C-5" or "5", hydroxyl group ("OH") of the pentose sugar). Specifically, a 3'-hydroxyl group of a first nucleotide is linked to a 5'-hydroxyl group of a second, adjacent nucleotide. The linkage between the two pentose sugars is via a phosphodiester bond. Based upon this linkage protocol, one end ("terminus") of the nucleotide chain has a 5'-terminus and the other end has a 3' terminus; linkage of two nucleotides occurs only when a hydroxyl group is present at the 3'-terminus. By convention, the base sequence of nucleotide chains is written in a 5' to 3' direction, i.e. 5'-ATCG-3' (SEQ ID NO:1) (the complementary chain is oriented in an anti-parallel fashion, i.e., the complementary chain is written in a 3' to 5' direction, i.e. 3'-TAGC-5'; SEQ ID NO:2).
The formation of the phosphodiester bond between deoxynucleotides is brought about by the enzyme "DNA-dependent DNA polymerase" (for ribonucleotides,. the enzyme is "DNA-dependent RNA polymerase"). In order for DNA polymerase to synthesize a macromolecule of DNA (i.e., "elongation" of the DNA macromolecule), the following components are required: (1) a single stranded DNA molecule, referred to as a "template"; (2) a (typically) short DNA strand, having a free 3'-hydroxyl group, which is hybridized to a specific site on the template, the short strand being referred to as a "primer"; and (3) free deoxyribonucleotide triphosphates ("dNTP"), i.e. deoxyadenosine 5'-triphosphate ("dATP"), deoxycytidine 5'-triphosphate ("dCTP"), deoxyguanosive 5'-triphosphate ("dGTP") and deoxythymidine 5'-triphosphate (typically abbreviated, by convention, as "TTP" but for purposes of consistency, abbreviated herein as "dTTP"). DNA polymerase elongates the primer in a single direction, i.e., from the 3'-end of the primer. The primer hybridizes to the template at a region where there can be the requisite complementary base pairing such that the DNA polymerase is capable of bringing about the formation of the phosphodiester bond between the 3'-hydroxyl group of the primer and an "incoming" DNTP which is complementary to the next base on the template. Thus, if the sequence of the template is 5'-ATCG-3' (SEQ ID NO: 1) and the primer is 3'-GC-5' (SEQ ID NO:3), the next nucleotide to be added to the 3'-terminus of the primer has the base A (complementary to T on the template) via the formation of a phosphodiester bond, mediated by DNA polymerase, between the dATP and the hydroxyl group of the T nucleotide on the primer. This process continues (typically) until a complete complement for the template is generated.
While the DNA polymerase enzyme functions principally to elongate a primer strand, the enzyme "ligase" functions principally to repair single-strand breaks by the formation of the phosphodiester bonds between two adjacent nucleotides which are hybridized to a unitary single strand. Thus, if the sequence of the unitary single strand is 5'-ATCG-3' (SEQ ID NO: 1) and a break, or "nick", has occurred between the A and the G of the complementary strand hybridized thereto, 3'-TA.cndot.GC-5' (where ".cndot." indicates such a break) (SEQ ID NO: 4), the ligase enzyme can "repair" the nick by the formation of a phosphodiester bond between the A and C. Beneficially, ligase (typically) cannot mediate the formation of such a phosphodiester bond if, inter alia, one of the nucleotides is not complementary to the nucleotide on the unitary strand; i.e., if the sequence of the unitary strand is ATCG and two other strands have the sequence TA and TC (the T of the TC strand cannot hybridize to the C of the ATCG strand), ligase cannot mediate the formation of a phosphodiester bond between TA and TC.
IV. Amplification Techniques
There are currently several available techniques for the amplification of nucleic acids. A well known amplification technique is referred to as the "Polymerase Chain Reaction", or "PCR." Mullis, K., et al. "Specific Enzymatic Amplification of DNA In Vitro: The Polymerase Chain Reaction." Cold Spring Harbor Symposia on Ouant. Bio. 51:263-273 (1986). In the PCR protocol, the template double stranded DNA is denatured (resulting in single strands A and B, "SS-A" and "SS-B"); two primers, one having a sequence complementary to a portion of SS-A, and one having a sequence complementary to SS-B, selectively hybridize to their respective complementary strands. In the presence of DNA polymerase and dNTPs, each primer will be elongated to form complements to the original SS-A and SS-B. Thus, at the end of one such "cycle", the number of "copies" of each strand increases by two--during the next cycle, then, there are two SS-A and two SS-B, each capable of being "copied" as described above. This process is referred to as "exponential" amplificati on, which means, in essence, that with each cycle, the number of copies double. I.e., theoretically after about 20 cycles, over one million copies are generated (2.sup.20).
Several practical problems exist with PCR. First extraneous sequences along the two templates can hybridize with the primers; this results in co-amplification due to such non-specific hybridization. As the level of amplification increases, the severity of such co-amplification also increases. Second, because of the ability of PCR to readily generate millions of copies for each initial template, accidental introduction of the end-product of a previous reaction into other samples easily leads to false-positive results. Third, PCR, does not, in and of itself, allow for detection of single-base changes, i.e. the protocol does not, in and of itself, allow for discrimi nation between "normal" and "mutational" sequences.
An a lternative to PCR is to the so-called "Ligase Chain Reaction", or "LCR". Barany, F. "Genetic disease detectio n and DNA amplification using thermostable ligase." Proc. Natl. Acad. Sci. 88:189-193 (1991). This technique amplifies a specific target exponentially, based upon utilization of four primers, two for each single strand of the original double stranded template. Each primer pair hybridizes in an adjacent fashion to each single strand of the template, and ligase covalently joins each primer at the region of adjacent hybridization. As with PCR, the resulting products serve as template (along with the original template) in the next cycle, thus leading to exponential amplification with each cycle. Beneficially, LCR can be utilized to detect mutations, and in particular, single nucleotide mutations--if the primers are designed as complements to the non-mutated version of, e.g., a gene, such that each primer is adjacent to a point where a known mutation can occur, and the template includes such mutation, the ligase cannot covalently couple the two primers that have hybridized thereto.
A problem associated with LCR is that, by definition, the procedure requires four primers which can result in non-specific "blunt-end ligation" of the primers without the need for the presence of target. I.e., there is preferential hybridization of the primers to their respective primer complements rather than the target sequence due to the utilization (most typically) of excess molar concentration of the primers. These double-stranded blunt-end fragments are capable of being ligated even in the absence of target DNA sequences. This can lead to high background signal or false-positive results.
Related to LCR is the so-called "Oligonucleotide Ligation Assay", or "OLA". Landegren, U., et al., Science 241:107-1080 (1988). The OLA protocol relies upon the use of two primers capable of hybridizing to a single strand of a target in an adjacent manner. OLA, like LCR, is particularly suited for the detection of point mutations. Unlike LCR, however, OLA does not result in exponential amplification but rather, "linear" amplification, i.e., at the end of each cycle, only a single end-product (the covalently coupled primers) is produced. A problem associated with OLA, then, is the lack of exponential amplification.
Combining PCR and OLA has been reported as a method of detection. Nickerson, D. A., et al., "Automated DNA diagnostics using an ELISA-based oligonucleotide ligation assay." Proc. Natl. Acad. Sci. USA 87:8923-8927 (1990). As reported, the target DNA was exponentially amplified using PCR followed by detection of the amplified target using OLA.
A problem associated with such combinations is that they inherit any problems associated with PCR, plus, by definition, multiple, and separate, processing steps are required.
Additional amplification techniques have been described. See, for example, International Publication No. WO 90/01069, "Process for amplifying and detecting nucleic acid sequences" (1990). In this protocol, as with LCR, two sets of primers are utilized; however, the primers are designed such that upon hybridization to SS-A and SS-B, "gaps" exist between the hybridized primers. These gaps are then "repaired" (filled) with complementary dNTPs (as mediated by DNA polymerase) such that when the gaps are repaired, ligase covalently joins the "repaired" primer to the other primer. Thus, at the end of each cycle, each single strand has a complement capable of serving as a target during the next cycle; thus, this procedure results in exponential amplification.
While this protocol avoids the LCR problem of non-specific blunt end ligation in the absence of target, unlike LCR this protocol does not allow for single base mutational detection. In addition, a critical difficulty in using this technique is the need to design the oligonucleotide primers such that the "gap" can be "repaired" with only a subset of the dNTPs. I.e., the gap cannot comprise all four of the bases such that only a maximum of three of the four dNTPs can be added to the reaction vessel.
The foregoing is to be considered as representative rather than exhaustive. As can be appreciated from the foregoing, however, is that certain of the benefits associated with the amplification protocols also contribute to drawbacks in utilization thereof. Ideally, then, any amplification protocol that is both sensitive (such as PCR) and specific (such as LCR) would enhance the ability to detect and amplify nucleic acid sequences.