Heritable information in biological systems is stored in macromolecules of deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). These molecules are polymers made up of four different base residues. Each base is attached to a sugar (deoxyribose or ribose for DNA or RNA, respectively) and phosphate group which form the backbone of the molecule. Each base-sugar-phosphate unit is called a nucleotide. The sequence of these nucleotides determines the content of the biological information. Generally, this information is used to manufacture proteins (polypeptides), which in turn are made up of chains of amino acid residues.
Electrophoresis is a common technique for separating these molecules by size. At physiological pH, the proton from the phosphate group in the backbone of the nucleic acid dissociates, giving molecules of DNA and RNA a negative charge. Thus, when placed in a constant electric field and the appropriate buffer, they would migrate towards the cathode.
A heterogeneous sample of DNA or RNA consisting of molecules of various sizes, however, cannot be separated in a solution because of the homogeneity of the phosphate backbone. This gives strands of nucleic acids nearly identical charge-to-mass ratios, regardless of their lengths. As a result, all the strands in the sample would migrate towards the cathode at the same rate and no separation would occur.
Electrophoresis is commonly used in conjunction with a semisolid gel made of agarose (a polysaccharide extracted from seaweed), or of polyacrylamide (a synthetic polymer) under a constant electric field (i.e. standard gel electrophoresis). In both situations, the gel forms minute pores which act as a molecular sieve which the DNA or RNA molecules can migrate through. This permits the separation of these molecules on the basis of size. Essentially, smaller molecules will migrate quicker through the gel than larger molecules because they can work their way through the pores with greater ease. Larger molecules will have greater difficulty passing through the pores and thus will migrate towards the cathode at a slower rate. The pore size can be controlled by the investigator by changing the concentration of the agarose or acrylamide in the gel. This permits the investigator to resolve fragments of nucleic acid as large as 30,000 nucleotides on an agarose gel in a constant electric field. Nucleic acid fragments up to 500 nucleotides long and differing in only one nucleotide can be resolved using a polyacrylamide gel.
Pulsed-field gel electrophoresis (PFGE) is commonly used to separate large fragments of DNA (30,000 nucleotides to 10 million nucleotides) and is particularly useful for separating entire chromosomes. PFGE is similar to standard gel electrophoresis except the DNA sample is subjected to an electric field which originates from different angles. Short electronic pulses from various angles allow very large DNA fragments to work their way through the pores. PFGE permits very large DNA fragments to be sorted according to their lengths, with the smaller fragments migrating quicker towards the cathode than the larger fragments.
Gel electrophoresis can also be used to separate proteins according to length. Proteins, in their native form, fold into various conformations forming globular or cylindrical structures. Thus, to separate these molecules according to their length, they must be unfolded, or denatured. Protein denaturation can be accomplished by adding a strong detergent in the electrophoresis solution. The detergent is amphipathic, that is, one end of the molecule is hydrophobic, while the other end is hydrophillic (and carries a negative charge at neutral pH). In solution, the hydrophobic end of the detergent interacts with each amino acid while the hydrophillic end protrudes from the protein. The protein is forced into a linear conformation due to the repulsion by the hydrophillic ends of the detergent. Under these conditions, the denatured proteins will have a similar charge-to-mass ratio, much like nucleic acids. Therefore, the same principle used to separate nucleic acids can also be used to separate protein according to length. Proteins are typically separated using standard gel electrophoresis in a polyacrylamide gel.
To determine the length of unknown fragments of DNA, RNA, or protein, a standard is loaded into a well cast in the gel. The standard contains fragments of DNA, RNA, or protein with known lengths and is run simultaneously with the unknown samples loaded in separate wells. By measuring the distance each known fragment has migrated from the originating well, one can construct a best-fit curve and determine the length of each unknown fragment.
Restriction endonucleases have revolutionized molecular biology. These molecules are enzymes which cleave DNA at short, specific sequences in the DNA. They allow the investigator to cut large DNA fragments into smaller, more manageable fragments of DNA. Currently, there are several hundreds of restriction endonucleases which recognize and cleave different sequences. Because of their sequence recognition specificity, they serve as important landmarks when studying an unknown fragment of DNA. A map of these sites, also known as a restriction enzyme map, is fundamental when analyzing DNA. DNA fragments of the same length but containing different sequence may produce a different subset of DNA fragments when cut with one or more restriction endonucleases. When these fragments are fractionated by gel electrophoresis, a unique banding pattern may be observed. This banding pattern can serve as a molecular fingerprint which can be used to differentiate DNA of the same fragment length but are composed of different sequences.
Restriction enzyme maps are particularly important for DNA cloning, DNA mutagenesis, genetic mapping and engineering. They establish the "blueprint" for the DNA from which more detailed experimentation can take place.
Restriction enzyme maps can be constructed by several techniques and most rely on data obtained from gel electrophoresis. Two or more restriction enzyme sites can be mapped by "inspection", provided the restriction enzymes create a limited number of fragments (i.e. less than 7). Essentially, an investigator compares banding patterns obtained after complete digestion with one restriction endonuclease (enzyme A or B) with a sample of DNA completely digested with both restriction endonucleases together (enzyme A and B). From this data, the investigator can deduce plausible maps which fit the data.
An alternative approach is to digest a sample of DNA with a single restriction endonuclease such that each fragment of DNA is cut once by the enzyme, despite the presence of multiple recognition sequences. Such a digest is termed a partial DNA digestion. This reaction creates a population of fragments of varying sizes. When these fragments are viewed after gel electrophoresis, the banding pattern is essentially meaningless due to the numerous bands. However, when these DNA fragments are immobilized onto a positively charged membrane (such as nitrocellulose or nylon), all the desired fragments can be identified using a labelled probe which is complimentary in sequence to one of the terminis of the original DNA fragments by DNA hybridization. The fragments which contain a common terminus can be detected using autoradiography (X-ray film). By calculating the size difference between adjacent and consecutive bands, one can generate a map for that particular restriction endonuclease.
Hybridization is a powerful technique for detecting the presence of a sequence in nucleic acid. DNA and RNA are composed of four different base residues; cytosine, guanine, adenine, and thymine (or uracil, in the case of RNA). These bases follow pairing rules; adenine pairs with thymine (or uracil), while cytosine pairs with guanine. DNA and RNA strands are polar, that is, strands run from 5' to 3'. DNA, and in some cases, RNA, can be found as double stranded structures. These structures consist of oppositely oriented strands held together by base pairing (i.e. one strand runs 5' to 3', while its complimentary strand runs 3' to 5' with respect to the first strand). Thus, to detect a certain sequence in a sample of nucleic acid, one can generate a probe containing complimentary sequence which as been labelled. Under appropriate hybridization conditions, the probe will base pair with the targeted sequence. Thus, fragments containing the target sequence can be identified by hybridization.
In many situations, more than one plausible map may be possible from the data obtained from complete single and double digestions. Thus, it is possible the correct map may not have been deduced by the investigator. In addition, this approach becomes very tedious with each additional band created by the digestion because the number of possible combinations increases factorially. However, computers are particularly well suited for handling such tasks.
A collection of clones containing small fragments of genomic DNA, collectively representing the entire genome, is known as a genomic library. One can construct a high-resolution genomic map by mapping individual clones by partial digestion and assembling or ordering the clones based on their restriction enzyme maps (or "fingerprints"). For example, two clones containing DNA from the identical region of the genome will have the same fingerprint. If they were from completely different regions of the genome, they would likely have very distinct fingerprints. However, if the clones contain overlapping regions of DNA, they would share similarities in their maps. In the latter situation, overlapping clones can be assembled together to construct a contig (a composite map). Eventually, multiple contigs can be assembled into a map representing the entire genome.
Mapping entire genomes by partial digestion has become a relatively popular approach. However, this approach is laborious due to the immense quantity of experimental data and thus computer assistance would be of great assistance. To facilitate speed and accuracy of analysis, several automated systems have been developed.
Prior art systems have been designed to operate on images of electrophoresis gels in the form of digital images consisting of a matrix of values. There are at least two designs. One is based on a scanner to capture a gel image as a graphic file. Computer software is used to analyze the location of the bands and calculate the unknown molecular weight. The second design uses a digitizer, which requires users to point to bands using a pen. Again, software interprets these distances to calculate the molecular weights.
The scanner design (or densitometer) is expensive. Some scanner equipment are capable of capturing the bands directly from a gel placed on a UV box, while others involve digitizing a photographic image (a second generation reproduction, resulting in loss of image quality). As cameras have different sensitivities, faint bands may not be picked up by a scanner or by a pre-scanner camera (where a photo is scanned), which might otherwise be detected by the human eye.
Calculation of unknown molecular weights from gel analysis software uses an algorithm which identifies bands by contrasting dark/light spots with the background colour. The user must first assign the lane locations before the computer finds these "bands". Once the computer has marked the bands on the screen, the user must check to see if the computer has mistakenly identified an artifact as a band. This is a significant problem when the gel has a lot of background artifacts, or where the bands are not easily distinguishable from the background. Quite often there is no clear demarcation between the spot and its surrounding background using conventional boundary determination techniques. Consequently, when using a fully automated system, the researcher may be misled as to the results of the analysis since the system may erroneously utilize values which do not represent valid information about the desired spot to form a basis for generating its analytical output data. Thus, there exists a need for providing visual confirmation to the user as to the exact boundaries of the spot values that were used during the analysis.
Thus, there are many steps which require human assistance, including: scanning, assigning lanes (although some systems do this automatically), and checking whether artifacts have been mistaken for bands. Hence there are many steps involved in calculating the molecular weights. Scanners also take up a large amount of bench space, and the digitized images take up a lot of computer memory space.
In the second design, digitizing method, the user puts a picture/autoradiograph (not a gel) onto a tablet. The user then uses a pen or other device to point to bands on the picture. The position of the pen is then sent to a computer and interpreted by software, which performs the molecular weight calculations. Examples of this design of system are disclosed in U.S. Pat. No. 4,970,672 and U.S. Pat. No. 4,592,089.
This approach is expensive. It also suffers from lack of repeatability, as it is difficult to point to a band at exactly the same position every time. This is due to the fact that the operator relies only on human accuracy in two dimensional positioning.
Therefore, what is needed is an apparatus and methodology which is inexpensive yet overcomes the limitations of the devices known in the prior art.