Nucleic acids may be either in the form of deoxyribonucleic acids (DNA) or in the form of ribonucleic acids (RNA). DNA and RNA are high molecular weight polymers formed from many nucleotide building blocks. Each nucleotide is composed of a base (a purine or a pyrimidine), a sugar (either ribose or deoxyribose) and a molecule of phosphoric acid. DNA is composed of the sugar deoxyribose and the bases adenine (A), guanine (G), cytosine (C) and thymine (T).
The nucleotides are assembled into a linear chain to form the genetic code. Each sequence of three nucleotides can be "read" as the code for one amino acid through the process of translation (DNA must first be converted into RNA through the process of transcription). By varying the combination of bases in each three base sequence, different amino acids are coded for. By linking various three base sequences together, a sequence of amino acids can be made which forms proteins and polypeptides. The entire coding unit for one protein is referred to as a gene. There can be one or more copies of a gene in an organism. Some genes are present in hundreds or thousands of copies, others are present only as a single copy.
Regardless of the number of copies, genes are linked together in an organism to form higher structural units referred to as chromosomes in higher organisms. In some lower organisms, genes may occur in extra chromosomal units referred to as plasmids. Genes need not be linked directly to each other in an end-to-end fashion. Certain non coding regions (i.e., introns: sequences of bases that do not translate into amino acids) may occur between genes or within a gene. The arrangement of nucleotides in an organism determines its genetic makeup which may be referred to as its genome (hence, DNA isolated from an organism is referred to as genomic DNA).
DNA in most organisms is arranged in the form of a duplex wherein two strands of DNA are paired together in the familiar double helix. In this model, hydrogen bonds are formed between A and T and between C and G on the paired strands. Thus, on one strand, the sequence ATCG (5'.fwdarw.3') will have on its complementary strand the sequence TAGC (3'.fwdarw.5'). Both strands, however, contain the same genetic code only in a complementary base-paired manner. One could read, therefore, either strand of DNA in order to determine the genetic sequence coded for.
For a further description of the organization, structure and function of nucleic acids, see Watson, Molecular Biology of the Gene, W. J. Benjamin, Inc. (3rd edit. 1976), especially Chapters 6-14.
Understanding and determining the genetic sequence of nucleic acids present in a sample is important for many reasons. First, a number of diseases are genetic in the sense that the nucleotide sequence for a "normal" gene is in some manner changed. Such a change could arise by the substitution of one base for another. Given that three bases code for a single amino acid, a change in one base (referred to as a point mutation) could result in a change in the amino acid which, in turn, could result in a defective protein being made in a cell. Sickle cell anemia is a classic example of such a genetic defect caused by a change in a single base in a single gene. Other examples of diseases caused by single gene defects include Factor IX and Factor VIII deficiency, adenosine deaminase deficiency, purine nucleotide phosphorylase deficiency, ornithine transcarbamylase deficiency, argininsuccinate synthetase deficiency, beta-thalassemia, .alpha..sub.1 antitrypsin deficiency, glucocerebrosidase deficiency, phenylalanine hydroxylase deficiency and hypoxanthine-guanine phosphoribosyltransferase deficiency. Still other diseases, such as cancers, are believed to be caused by the activation, increase in copy number and/or removal of suppression of genes known to be present in the genome (referred to as oncogenes). Examples of oncogenes believed to be relevant to certain cancers include N myc for neuroblastomas, retinoblastomas and small cell lung cancers and c-abl for chronic myelogenous leukemia. For a further description of the relevance of oncogenes to the diagnosis of cancers and for a listing of specific oncogenes, see Weinberg, Sci. Amer., Nov. 1983, Slamon et al., Science, 224:256 (1984), U.S Pat. No. 4,699,877 and 4,918,162.
Second, in addition to changes in the sequence of nucleic acids, there are genetic changes that occur on a structural level. Such changes include insertions, deletions and translocations along a chromosome and include increased or decreased numbers of chromosomes. In the former instance, such changes can result from events referred to as crossing over where strands of DNA from one chromosome exchange various lengths of DNA with another chromosome. Thus, for example, in a "normal" individual, the gene for protein "X" might reside on chromosome 1; after a crossing over event, that gene could now have been translocated to chromosome 4 (with or without an equal exchange of DNA from chromosome 4 to chromosome 1) and the individual may not produce X.
In the instance of increased or decreased chromosome number (referred to as aneuploidy), instead of a "normal" individual having the correct number of copies of each chromosome (e.g., two of each in humans [other than the X and Y chromosomes]), a different number occurs. In humans, for example, Down's syndrome is the result of having three copies of chromosome 21 instead of the normal two copies. Other aneuploid conditions result from trisomies involving chromosomes 13 and 18.
Third, infectious diseases can be caused by parasites, microorganisms and viruses all of which have their own nucleic acids. The presence of these organisms in a sample of biological material often is determined by a number of traditional methods (e.g., culture). Because each organism has its own genome, however, if there are genes or sequences of nucleic acids that are specific to a single species (to several related species, to a genus or to a higher level of relationship), the genome will provide a "fingerprint" for that organism (or species, etc.). Examples of viruses to which this invention is applicable include HIV, HPV, EBV, HSV, Hepatitis B and C and CMV. Examples of microorganisms to which this invention is applicable include bacteria and more particularly include H. influenzae, mycoplasma, legionella, mycobacteria, chlamydia, candida, gonocci, shigella and salmonella.
In each example set forth above, by identifying one or more sequences that are specific for a diseases or organism, one can isolate nucleic acids from a sample and determine if that sequence is present. A number of methods have been developed in an attempt to do this.
While it is critical that one or more sequences specific for a disease or organism be identified, it is not important to the practice of this invention what the target sequences are or how they are identified. The most straightforward means to detect the presence of a target sequence in a sample of nucleic acids is to synthesize a probe sequence complementary to the target nucleic acid (instrumentation, such as the Applied BioSystems 380B, is presently used to synthesize nucleic acid sequences for this purpose). The synthesized probe sequence then can be applied to a sample containing nucleic acids and, if the target sequence is present, the probe will bind to it to form a reaction product. In the absence of a target sequence and barring non specific binding, no reaction product will be formed. If the synthesized probe is tagged with a detectable label, the reaction product can be detected by measuring the amount of label present Southern blotting is one example where this method is used.
A difficulty with this approach, however, is that it is not readily applicable to those instances where the number of copies of the target sequence present in a sample is low (i.e., less than 10.sup.7). In such instances, it is difficult to distinguish signal from noise (i.e., true binding between probe and target sequences from non specific binding between probe and non target sequences). One way around this problem is to increase the signal Accordingly, a number of methods have been described to amplify the target sequences present in a sample.
One of the best known amplification methods is the polymerase chain reaction (referred to as PCR) which is described in detail in U.S. Pat. Nos. 4,683,195, 4,683,202 and 4,800,159. Briefly, in PCR, two primer sequences are prepared which are complementary to regions on opposite complementary strands of the target sequence. An excess of deoxynucleoside triphosphates are added to a reaction mixture along with a DNA polymerase (e.g., Taq polymerase). If the target sequence is present in a sample, the primers will bind to the target and the polymerase will cause the primers to be extended along the target sequence by adding on nucleotides. By raising and lowering the temperature of the reaction mixture, the extended primers will dissociate from the target to form reaction products, excess primers will bind to the target and to the reaction products and the process is repeated.
Another method for amplification is described in EPA No. 320,308, published Jun. 14, 1989, which is the ligase chain reaction (referred to as LCR). In LCR, two complementary probe pairs are prepared, and in the presence of the target sequence, each pair will bind to opposite complementary strands of the target such that they abut. In the presence of a ligase, the two probe pairs will link to form a single unit. By temperature cycling, as in PCR, bound ligated units dissociate from the target and then serve as "target sequences" for ligation of excess probe pairs. U.S. Pat. No. 4,883,750 describes a method similar to LCR for binding probe pairs to a target sequence but does not describe an amplification step.
A still further amplification method is described in PCT Appl. No. PCT/US87/00880, published Oct. 22, 1987, and is referred to as the Qbeta Replicase method. In this method, a replicative sequence of RNA which has a region complementary to that of a target is added to a sample in the presence of an RNA polymerase. The polymerase will copy the replicative sequence which can then be detected.
Still other amplification methods are described in GB Appl. No. 2 202 328, published Sep. 21, 1988, and in PCT Appl. No. PCT/US89/01025, published Oct. 5, 1989. In the former application, "modified" primers are used in a PCR like, template and enzyme dependent synthesis. The primers may be modified by labelling with a capture moiety (e.g., biotin) and/or a detector moiety (e.g., enzyme). In the latter application, an excess of labelled probes are added to a sample. In the presence of the target sequence, the probe binds and is cleaved catalytically. After cleavage, the target sequence is released intact to be bound by excess probe. Cleavage of the labelled probe signals the presence of the target sequence. And finally U.S. patent application Ser. No. 07/648,257 filed Jan. 31, 1991 discloses and claims a method for amplifying nucleic acid (referred to as strand displacement amplification) which comprises target generation prior to amplification in which restriction enzymes are employed.
Each of the above referenced amplification methods benefit from access to the desired nucleic acid sequence to be amplified. In addition, the need to generate amplifiable target fragments with defined 5'- and 3'-ends (i.e., ending at specific nucleotide positions) is a continually sought after goal.