The genetic information of the cell is stored and transmitted in the nucleotide sequence of the deoxyribonucleic acid (DNA). Expression of this information requires transcription of DNA into messenger ribonucleic acid (mRNA) molecules that carry specific and precise information to the cytoplasmic sites of protein synthesis. In eukaryotic cells the mRNA are synthesized in the nucleus, often as larger precursor molecules called heterogenous nuclear RNA (hnRNA).
The mRNA in the cytoplasm has several identifying characteristics. In eukaryotic cells, mRNA is usually a monocistronic, and encodes only one polypeptide. The 5' end is capped with a specific structure involving 7-methylguanosine linked through a 5'-triphosphate bridge to the 5' end of the messenger sequence. A 5'-non-translated region, which may be quite short or hundreds of nucleotides in length, separates the cap and the translational start site, which contains an AUG codon. The leader sequences of most vertebrate mRNAs are 20 to 100 nucleotides in length. Usually the translational start site is the first AUG sequence encountered as the message is read from the 5' to the 3' end. The informational sequences that encode a polypeptide are then contiguous with the initiation signal. The polypeptide-encoding sequences continue until a specific translational termination site is reached, which is followed by a 3' untranslated sequence of about 100 nucleotides in length, before the mRNA terminates in a polyadenylate tail.
Prokaryotic mRNA differs from eukaryotic mRNA in a few details. The 5' terminus is not capped, but retains a terminal triphosphate from initiation of its synthesis by an RNA polymerase. Most of the mRNA arc polycistronic, encoding several polypcptides, and can include more than one initiation AUG sequence. In each case a ribosome-positioning sequence is located about 10 nucleotides upstream of the AUG initiation signal. An untranslated sequence follows the last coding sequence, but there is no polyadenylate tail.
The translational start site of eukaryotic mRNA is also called a Kozak sequence (Kozak, M., 1987, Nucl. Acid Res. 15(20):8125-8148, herein incorporated by reference). An optimal Kozak sequence has the form of:
(TCC) GCC (A/G)CC ATG G (SEQ ID NO:1) PA1 AGATCTTTATGGACC (SEQ ID NO:2)
The most highly conserved position in this motif is the purine (which is most often an A) three nucleotides upstream of the ATG codon, which indicates the start of translation (Kozak, M., 1987, J. Mol. Biol. 20:947-950, herein incorporated by reference). Upstream ATG codons occur in fewer than 10% of vertebrate mRNAs, a notable exception of which are oncogene transcripts (Kozak, M., 1987, Nucl. Acids Res. 15:8125-8148). Both naturally occurring and synthetic translational start sites of the Kozak form can be used in the production of polypeptides by molecular genetic techniques (Kozak, M., 1996, Mamm. Genome 7:563-574).