Deoxyribonucleic acid (DNA) is the primary genetic material. DNA is an informational molecule which encodes all of the proteins which make up a living organism. This capacity it serves in all living organisms.
DNA consists of two intertwined polynucleotide chains, the double helix. Each chain is a polymer made up of nucleotides. The nucleotide constituents consist of a sugar, a phosphate and a nitrogenous heterocyclic base. Interchain pairing between the bases, via hydrogen bonding, holds the two chains together in the helical coil.
The nucleotides of DNA contain the sugar 2-deoxyribose and are designated deoxyribonucleotides. Nucleotides which contain the sugar D-ribose are called ribonucleotides: these are the building blocks of ribonucleic (RNA), the intermediary transcript of DNA which serves as the actual template for protein synthesis.
In all nucleotides the sugar moiety is attached to the nitrogenous base via the glycosidic carbon (1' carbon of the ribose). This combination of sugar and base is called a nucleoside. Phosphorylation of a nucleoside at the 5' carbon of the sugar gives a nucleotide. The backbone of the DNA polymer is formed of phosphodiester bonds between 2'-deoxynucleoside 5'-monophosphates. (3', 5'-phosphodiester bridges)
The nucleotides of DNA differ only in the nitrogenous base. There are two types of nitrogenous bases in nucleotides, pyrimidines and purines. The pyrimidines are uracil, thymine and cytosine (abbreviated U, T, and C respectively). Nucleotides containing uracil are found primarily in RNA whereas thymine is found in DNA. The major purines are adenine and guanine (abbreviated A and G respectively). In the DNA helix, complementary DNA chains are held together by base pairing. The sugarphosphate backbones are on the outside of the DNA molecule and the purine and pyrimidine bases on the inside. Adenine (A) can pair only with thymine (T) while guanine (G) can pair only with cytosine.
The genetic information of DNA is stored in the linear sequence of the four nucleotides. Most nucleotides along a strand of DNA make up genes which code for specific polypeptides. The nucleotide sequence of DNA is read in groups of three nucleotides. Each triplet is a "code-word", or codon, for an amino acid. As there are 4 different nucleotides in DNA (distinguished by the bases A, T, G, and C) there are 64 different codons. These codons comprise the entire genetic code. Most of the codons designate an amino acid; some serve as start and stop signals for protein translation. The genetic code is degenerate because there is more than one codon for most of the amino acids. For example, the amino acid alanine is coded for by the codons CGA, CGG, CGT and CGC. (In RNA, the triplets are GCU, GCC, GCA and GCG.)
Several techniques have been developed for determining the nucleotide sequence of DNA. Among the more widely practiced are the methods of Maxam and Gilbert and of Sanger.
In the DNA sequencing technique of Maxam and Gilbert, a segment of DNA is labeled at one end with radiolabeled phosphate. The labeled DNA is divided into four samples and each sample is treated with a chemical that specifically destroys one or two of the four bases in the DNA. The "nicked" molecules are then treated with piperidine which breaks the DNA backbone at the site where the base has been destroyed. This generates a series of labeled fragments the lengths of which depend on the distance of the destroyed base from the labeled end of the segment. The labeled polynucleotides are separated according to size on an acrylamide gel. The gel is autoradiographed and the patterns of bands on the X-ray film determine which base was destroyed to produce each radioactive fragment. From this information the position of the destroyed bases can be determined and the overall sequence of the DNA deduced.
The DNA-sequencing technique of Sanger is an enzymatic procedure which entails the synthesis of radiolabeled DNA polynucleotides from the DNA strand to be sequenced. Chain-terminating dideoxynucleoside triphophates are used to stop synthesis at a particular nucleotide. A Sanger sequencing reaction includes a DNA strand to be sequenced, a labeled primer complementary to the end of that strand, a carefully controlled ratio of one particular dideoxynucleotide with its normal deoxynucleotide and the other 3 deoxynucleotides. When DNA polymerase is added, normal polymerization begins from the primer; when a dideoxynucleotide is incorporated, the chain is terminated. This results in a series of labeled polynucleotides whose lengths depend upon the location of a particular base relative to the end of the DNA strand.
Four separate polymerase reactions are conducted each containing one type of dideoxynucleotide. Radiolabeled fragments are separated by size on an acrylamide gel. The pattern of polynucleotides gives the DNA sequence.
The methods of Sanger and Maxam and Gilbert are convenient ways of sequencing single fragments of DNA of about 400 base pairs or less in length. To sequence large pieces of DNA, overlapping fragments of suitable length must be generated and sequenced individually. The overlapping sequence information among fragments provides the sequential relationship of the fragments so that their relative order can be assigned. From this information, the entire sequence of the parent piece of DNA can be pieced together.
With these approaches it is apparent that as the length of the DNA strand to be sequenced increases, the probability of obtaining fragments of a DNA strand which overlap sufficiently to eliminate the random occurrence of corresponding sequence decreases. Consequently, the number of randomly generated fragments of the strand necessary for accurate sequencing increases. For DNA strands even two orders of magnitude less than the length of a human chromosome the number of randomly generated fragments is immense. Thus, the technique is impractical for sequencing molecules of this size.
The Sanger technique of sequencing long strands of DNA requires DNA cloning procedures because DNA polymerization requires a primer. This is overcome by cloning the fragment into a vector so that it is contiguous with a region of known sequences so that the complementary primer may be provided. Because of this dependency on cloning technology, the procedure is difficult to automate.