Cells grow and differentiate, carry out their structural or metabolic roles, participate in organismal development, and respond to their environment by altering their gene expression. Cellular functions are controlled by the timing and amount of expression attributable to thousands of individual genes. The regulation of expression is metabolically vital in that it conserves energy and prevents the synthesis and accumulation of intermediates such as RNA and incomplete or inactive proteins when the gene product is not needed.
Regulatory protein molecules are absolutely essential in the control of gene expression. These molecules turn individual or groups of genes on and off in response to various inductive mechanisms of the cell or organism; act as transcription factors by determining whether or not transcription is initiated, enhanced, or repressed; and splice transcripts as dictated in a particular cell or tissue. Although regulatory molecules interact with short stretches of DNA scattered throughout the entire genome, most gene expression is regulated near the site at which transcription starts or within the open reading frame of the gene being expressed. The regulated stretches of the DNA can be simple and interact with only a single protein, or they can require several proteins acting as part of a complex in order to regulate gene expression.
The double helix structure and repeated sequences of DNA create external features which can be recognized by the regulatory molecules. These external features are hydrogen bond donor and acceptor groups, hydrophobic patches, major and minor grooves, and regular, repeated stretches of sequence which cause distinct bends in the helix. Such features provide recognition sites for the binding of regulatory proteins. Typically, these recognition sites are less than 20 nucleotides in length although multiple sites may be adjacent to each other and each may exert control over a single gene. Hundreds of these DNA sequences have been identified, and each is recognized by a different protein or complex of proteins which carry out gene regulation.
The regulatory protein molecules or complexes recognize and bind to specific nucleotide sequences of upstream (5') nontranslated regions, which precede the first translated exon of the open reading frame (ORF); of intron junctions, which occur between the many exons of the OR; and of downstream (3') untranslated regions, which follow the ORF. The regulatory molecule surface features are extensively complementary to the surface features of the double helix. Even though each individual contact between the protein(s) and helix may be relatively weak (hydrogen bonds, ionic bonds, and/or hydrophobic interactions) and the 20 or more contacts occurring between the protein and DNA result in a highly specific and very strong interaction.
Families of Regulatory Molecules
Many of the regulatory molecules incorporate one of a set of DNA-binding structural motifs, each of which contains either .alpha. helices or .beta. sheets and binds to the major groove of DNA. Seven of the structural motifs common to regulatory molecules are helix-turn-helix, homeodomains, zinc finger, steroid receptor, .beta. sheets, leucine zipper, and helix-loop-helix.
The helix-turn-helix motif is constructed from two .alpha. helices connected by a short chain of amino acids, which constitutes the "turn". The two helices interact with each other to form a fixed angle. The more carboxy-terminal helix is called the recognition helix because it fits into the major groove of the DNA. The amino acid side chains of the helix recognize the specific DNA sequence to which the protein binds. The remaining structure varies a great deal among the regulatory proteins incorporating this motif. The helix-turn-helix configuration is not stable without the rest of the protein and will not bind to DNA without other peptide regions providing stability. Other peptide regions also interact with the DNA, increasing the number of unique sequences a helix-turn-helix can recognize.
Many sequence-specific DNA binding proteins actually bind as symmetric dimers to DNA sequences that are composed of two very similar half-sites, also arranged symmetrically. This configuration allows each protein monomer to interact in the same way with the DNA recognition site and doubles the number of contacts with the DNA. This doubling of contacts greatly increases the binding affinity while only doubling the free energy of the interaction. Helix-turn-helix motifs always bind to DNA that is in the B-DNA form.
The homeodomain motif is found in a special group of helix-turn-helix proteins that are encoded by homeotic selector genes, so called because the proteins encoded by these genes control developmental switches. For example, mutations in these genes cause one body part to be converted into another in the fruit fly, Drosophila. These genes have been found in every eukaryotic organism studied. The helix-turn-helix region of different homeodomains is always surrounded by the same structure, but not necessarily the same sequence, and the motif is always presented to DNA the same way. This helix-turn-helix configuration is stable by itself and, when isolated, can still bind to DNA. It may be significant that the helices in homeodomains are generally longer than the helices in most HLH regulatory proteins. Portions of the motif which interact most directly with DNA differ among these two families. Detailed examples of DNA-protein binding are described in Pabo, C. O. and R. T. Sauer (1992; Ann. Rev. Biochem. 61:1053-95).
A third motif incorporates zinc molecules into the crucial portion of the protein. These proteins are most often referred to as having zinc fingers, although their structure can be one of several types. Proteins in this family often contain tandem repeats of the 30-residue zinc finger motif, including the sequence patterns Cys-X2 or 4-Cys-X12-His-X3-5-His. Each of these regulatory proteins has an a helix and an antiparallel .beta. sheet. Two histidines in the .alpha. helix and 2 cysteines near the turn in the .beta. sheet interact with the zinc ion which holds the .alpha. helix and the .beta. sheet together. Contact with the DNA is made by the arginine preceding the .alpha. helix, and by the second, third, and sixth residues of the a helix. When this arrangement is repeated as a cluster of several fingers, the .alpha. helix of each finger can contact and interact with the major groove of the DNA. By changing the number of zinc fingers, the specificity and strength of the binding interaction can be altered.
The steroid receptors are a family of intracellular proteins that include receptors for steroids, retinoids, vitamin D, thyroid hormones, and other important compounds. The DNA binding domain of these proteins contains about 70 residues, eight of which are conserved cysteines. The steroid receptor motif forms a structure in which two .alpha. helices are packed perpendicularly to each other, forming more of a globular shape than a finger. Each helix has a zinc ion which holds a peptide loop against the N-terminal end of the helix. The first helix fits into the major groove of DNA, and side chains make contacts with edges of the DNA base pairs. The steroid receptor proteins, like the helix-turn-helix proteins, form dimers that bind the DNA. The second helix of each monomer contacts the phosphate groups of the DNA backbone and also provides the dimerization interface. In some cases, multiple choices can exist for heterodimerization which produces another mechanism for fine-tuning the regulation of numerous genes.
Another family of regulatory protein molecules uses a motif consisting of a two-stranded antiparallel .beta. sheet to recognize the major groove of DNA. The exact DNA sequence recognized by the motif depends on the amino acid sequence in the .beta. sheet from which the amino acid side chains extend and contact the DNA. In two prokaryotic examples of the .beta. sheet, the regulatory proteins form tetramers when binding DNA.
The leucine zipper motif commonly forms dimers and has a 30-40 residue motif in which two .alpha. helices (one from each monomer) are joined to form a short coiled-coil. The helices are held together by interactions among hydrophobic amino acid side chains (often on heptad-repeated leucines) that extend from one side of each helix. Beyond this, the helices separate, and each basic region contacts the major groove of DNA. Proteins with the leucine zipper motif can also form either homodimers or heterodimers, thus extending the specific combinations available to activate or repress expression.
Yet another important motif is the helix-loop-helix, which consists of a short a helix connected by a loop to a longer .alpha. helix. The loop is flexible and allows the two helices to fold back against each other. The .alpha. helices bind both to DNA and to the HLH structure of another protein. The second protein can be the same (producing homodimers) or different (producing heterodimers). Some HLH monomers lack sufficient .alpha. helix to bind DNA, but they can still form heterodimers which can serve to inactivate specific regulatory proteins.
Hundreds of regulatory proteins have been identified to date, and more are being characterized in a wide variety of organisms. Most regulatory proteins have at least one of the common structural motifs for making contact with DNA, but several important regulatory proteins, such as the p53 tumor suppressor gene, do not share their structure with other known regulatory proteins. Variations on the known motifs and new motifs have been and are currently being characterized (Faisst, S. and S. Meyer (1992) Nucl. Acids Res. 20: 3-26).
Although binding of DNA to a regulatory protein is very specific, there is no way to predict the exact DNA sequence to which a particular regulatory protein will bind or the primary structure of a regulatory protein for a specific DNA sequence. Thus, interactions of DNA and regulatory proteins are not limited to the motifs described above. Other domains of the proteins often form crucial contacts with the DNA, and accessory proteins can provide important interactions which may convert a particular protein complex to an activator or a repressor or may prevent binding (Alberts, B. et al. (1994) Molecular Biology of the Cell, Garland Publishing Co, New York, NY pp.401-74).
Diseases and Disorders Related to Gene Regulation
Many neoplastic growths in humans can be traced to problems of gene regulation. Malignant growth of cells may be the result of excess transcriptional activator or loss of an inhibitor or suppressor (Cleary ML (1992) Cancer Surv. 15:89-104). Alternatively, gene fusion may produce chimeric loci with switched domains, such that the level of activation is no longer correct for the gene specificity of that factor.
The cellular response to infection or trauma is beneficial when gene expression is appropriate. However, when hyper-responsivity or another imbalance occurs for any reason, improper or insufficient regulation of gene expression may cause considerable tissue or organ damage. This damage is well documented in immunological responses to allergens, heart attack, stroke, and infections (Harrison's Principles of Internal Medicine, 13/e.COPYRGT., (1994) McGraw Hill, Inc. and Teton Data Systems Software). In addition, the accumulation of somatic mutations and the increasing inability to regulate cellular responses is seen in the prevalence of osteoarthritis and onset of other disorders associated with aging.
The discovery of new human regulatory protein molecules which are important in disease development and the polynucleotides encoding them satisfies a need in the art by providing new compositions which are useful in the diagnosis, prevention and treatment of diseases associated with cell proliferation, particularly immune responses and cancers.