As is well-known, a strand of DNA is comprised of four different nucleotides, as determined by their bases: Adenine, Thymine, Cytosine and Guanine, respectively designated A, T, C, G. For each strand of DNA, there is a homologous strand in which A pairs with T, and C pairs with G. A specific sequence of bases which codes for a protein is referred to as a gene, and that gene is segmented into regions which are responsible for protein compositions (exons) and regions which do not contribute to protein composition (introns). An exon can vary in length from about 30 basepairs to thousands of basepairs (bp). For purposes of the present invention, however, primary concern resides with the coding regions of the gene(exons), though the test procedures of the invention can be easily appropriately modified, where desired, to screen for carcinogenic mutations in the introns, also, and for other purposes as well.
DNA diagnostic testing of genes with mutational defects is important for ascertaining information with regard to individual susceptibility to particular diseases, classification of the disease in a therapeutically relevant sub-group, the carriership (and potential transfer) or prenatal presence of birth defects, and other important purposes. The gold standard in DNA diagnostic testing for the presence of mutations is DNA sequencing, which involves the complete decoding of the gene. This, however, is time-consuming and inefficient. Time-consuming because despite numerous ongoing attempts to simplify and greatly accelerate the process, all currently used routine sequencing systems are still based on the principle originally developed by Sanger and colleagues (as described, for example, in Molecular Pathology, Heim and Silverman, 1990, pages 7-10), and only partially automated. They are inefficient because, instead of simply determining differences between entire DNA fragments, every DNA fragment must be completely decoded before any differences from normal (mutations) are revealed. The fact that the exact location of a discase-causing mutation can be different from one individual to the next, moreover, precludes the possibility of only testing for frequently occurring known mutations. Indeed, the many different mutations that may convert a healthy gene into a diseased gene makes it necessary in each case to inspect the entire gene at all possible positions for mutations (termed gene scanning) rather than for only a few frequently occurring ones (mutation screening). Mutation screening methods are relatively simple and cost-efficient. Outside DNA sequencing, potential gene scanning systems are scarce and their cost-efficiency is questionable.
Recently, a method was disclosed by one of the applicants of the present patent application for comparative scanning of 100-600 basepairs (bp) gene fragrnents by multiplex PCR amplification followed by two-dimensional electrophoretic separation in polyacrylamide gels on the basis of both size and basepair sequence; "Multiplex Co-amplification of 24 Retinoblastoma Gene Exons After Pre-amplification By Long-Distance PCR," Jan Vijg and Daizong Li, Nucleric Acids Research, 1996, Vol. 24, No. 1, p. 538-9. "Two-Dimensional DNA Typing", Jan Vijg, Molecular Biotechnology, pages 275, on, Vol. 4, 1995; and in copending U.S. patent application Ser. No. 08/471,249, filed Jun. 6, 1995 for Method Of And Apparatus For Diagnostic DNA Testing now U.S. Pat. No. 5,814,49, issued Sep. 29, 1998. The latter can be accomplished by denaturing gradient gel electrophoresis (DGGE). This multiplexing technique for analyzing predetermined gene exons derived from DNA, involves adding primer pairs surrounding successive groups of the gene exons followed by effecting long-distance PCR amplifications thereof in a common tube or vessel (multiplex long-PCR) to achieve relatively long resulting amplicons; adding further primer pairs surrounding each of the gene exons or parts thereof, and then effecting multiplex PCR amplifications thereof in the common tube or vessel with relatively short resulting amplicons; and electrophoretically separating the gene fragments. By size separation, mutations representing deletions or insertions varying from several to many basepairs can be detected. In DGGE, point mutations, such as basepair substitutions are also detectable. This is due to the tendency of double-stranded DNA fragments to melt at a point in the gradient where the temperature equals the melting temperature of the lowest-meiting domain of the fragment.
In the process of PCR amplification, the mutational target fragments (e.g., gene exons) are surrounded by primers, i.e., short (about 20 base pairs) single-stranded DNA fragments. Primers are chosen in such a way that they are complementary (bind to) positions at the left and right boundaries or ends of the target fragments. By using appropriate enzymes that extend each primer inwards, towards each other, the mutational target can be copied. This can be repeated a great number of times in a so-called thermocycler--a machine that first heats up the DNA sample, thereby separating the single strands followed by cooling down, which results in annealing of the primers to their target sequences, and the subsequent enzymatic extensions of the primers by polymerase enzymes. The net result is an amplification of the fragment in between the primers of typically one million times. This provides enough target DNA to detect the fragment by using a DNA-binding dye, after electrophoretic separation, without the use of radioactive tracers; the rest of the DNA being now a relatively small amount and invisible.
The positioning of the primers is critical because, for such short sequences, there is ample opportunity to bind elsewhere in the complex DNA molecules that form the starting material of the test. This would lead to the copying of other fragments than the ones of interest. Positioning of primers is even more critical in denaturing gradient gel electrophoresis, where each fragment must have an optimal melting temperature in order to allow all possible mutations to be detected. It is common practice to couple one of the two primers surrounding a gene target fragment to a GC-rich clamp sequence of about 30 basespairs long. This clamp is very stable and functions as the highest melting domain; that is, the part of the DNA molecule that keeps the fragment together. This is important because once a fragment migrates off the gel it can no longer be detected. In PCR-DGGE it is just as important, furthermore, that the target fragment consists of one single domain (flat throughout the gene fragment) that has a lower stability than the GC-clamp. In that case, it will melt earlier than the clamp, resulting in a structure that is partially double-stranded (the clamp) and partially single-stranded (the target fragment): a so-called branched structure. Such a fragment will be greatly retarded in the gel. Typically, the exact position where such a fragment melts (and thus halts its migration) is completely dependent on its sequence. With a fragment of, say, 500 basepairs, one single basepair difference will lead to a migrational difference that can be employed by detecting mutations in such fragments as a migrational difference with a control (wildtype) fragment. Hence, in contrast to DNA sequencing, DGGE does allow comparative scanning of whole fragments for mutational differences without the need to completely decode each molecule.
To carry out DGGE in two dimensions (2-D) rather than in one, increases the efficiency of the system as well as its reliability. Indeed, in a 2-D gel, many more fragments can be analyzed simultaneously than by a 1-D separation. It is more reliable because every fragment can be defined by both its melting temperature and its size. A disadvantage, however, resides in its increased complexity, requiring extreme attention to the design of the test. Since the design of a DGGE test itself is not trivial (primers must be chosen in a way that the amplified fragments represent a single domain, as before discussed), the 2-D principle adds a dimension in complexity as well as in resolution. A typical design of PCR primers, moreover, must take into consideration many other variables inherent to the PCR process, such as primer annealing temperatures.
As disclosed in the above-cited Vijg papers and the said patent application, the two-step PCR process enables many different exons to be amplified simultaneously in the same reaction. In this method, first, groups of target fragments (e.g., groups of exons) are amplified as large 5-40 kb amplicons, for example, by long-distance PCR (an improved form of PCR in which more efficient polymerase enzymes are employed that generate longer fragments). Second, with these amplicons as templates, large numbers of individual target sequences can now be amplified simultaneously in the same reaction vessel under a single set of experimental conditions. Normally it is very difficult to find one set of reaction conditions under which multiple fragments specified by multiple primer pairs are amplified simultaneously (i.e., multiplex PCR). Probably because the pre-amplification by long-distance PCR increases the amount of target sequences relative to the rest of the complex genomic DNA, flexibility with respect to the subsequent PCR amplification of the target sequences is much greater than normal, which permits extensive multiplexing; i.e., co-amplification of different target fragments in the same reaction.
This method of extensive multiplexing greatly economized the process of template preparation in genetic testing over earlier techniques for inspecting for mutations after PCR amplifying the many distributed exons of the often very large disease genes. As an example, it is now possible, to generate a collection of as many as 25 fragments corresponding to 26 exons of the tumor suppresser gene RBl in one single two-step PCR reaction. After the two-step multiplex PCR amplification, a single 2-dimensional electrophoretic separation, as the third step, is sufficient to resolve all these fragments and detect all possible mutations as variations in fragment spot position. With this system, a genetic testing method has become available that is both highly accurate (in detecting all possible mutations) and cost-efficient.
The only remaining drawback involves the absence of a rapid design of an optimal test for one or more individual genes involved in a particular disease. The human genome, indeed, contains an estimated number of 100,000 different genes, many of which might ultimately prove to be involved in one or more diseases. To design a set of PCR primers for many different genes and/or gene combinations that fulfill criteria for optimal (multiplex) PCR, optimal denaturing gradient electrophoretic separation and 2-D distribution is not trivial. As shown in later-described FIG. 1, the computer-assisted test design of the invention must provide, for an optimal 2-D genetic test for one or multiple genes with predicted primer and GC-clamp respective positions and lengths for meeting optimal melting criteria, PCR criteria and 2-D spot distribution criteria.
The present invention is directed to solving this problem and, through a computer-assisted procedure, to semi-automatically and/or automatically design multiplex PCR/2-D electrophorctic tests for one or more genes.