This invention relates to the regulation of gene expression.
In living systems much of the work of the cell is carried out by proteins according to the blue-print specified in the cell's genome. Genetic engineering can alter the working of the cell, using recombinant DNA techniques, in two main ways. Firstly, it can modify the type of proteins the cell produces, either by modifying the DNA code for production of a particular protein, or by altering or deleting the code (so that that protein is no longer produced) or by inserting entirely new code segments (so as to produce a protein that is not normally found in the cell at all). Secondly, in some cases it is possible, by changes to the DNA in the neighbourhood of that which codes for the protein, to increase or reduce the amount of protein produced by the code, or to change the circumstances which cause the cell to produce the protein.
The present invention relates primarily to techniques of the second type, and provides means whereby the production of proteins by the cell may be more readily regulated as may be desired.
It is well-known that genes have a regulatory region, which controls their expression. Typically, regulation is by means of binding of a protein present in the cell onto a short DNA sequence (binding site) upstream of the start of transcription. Binding of the protein onto the binding site may either inhibit or promote expression of the downstream gene. The mechanisms vary: in some cases the bound molecule may overlap or occupy the site needed by RNA polymerase to initiate messenger RNA production, so that expression is inhibited; in other cases the bound protein may prevent binding of a different protein which would have an inhibiting effect. Alternatively, binding of the protein may facilitate RNA polymerase binding thereby stimulating gene expression. The classic case of gene regulation is that of the bacteriophage lambda repressor which binds to specific DNA sequences on the lambda genome and, under the appropriate physiological conditions, can repress or activate gene expression. There are a number of examples in the literature (e.g. Brent, R. and Ptashne, M. (1985), Cell 43, 729-736), which show that DNA binding, protein dimerisation and gene modulation are separable functions.
One class of protein regulators comprises those which bind to double-stranded DNA as dimers. They comprise discrete domains. Part of one such domain is cognate to a specific DNA site (defined by a specific sequence of bases, the exact nature of some of which are essential and some optional), recognising or interacting with it. Whilst bases in either strand may be involved in contacts with the DNA binding domain of the protein, the protein binding site can be unambiguously defined by the base sequence of either DNA strand within that protein binding site. Two such protein monomers in the form of a dimer will bind to a locus comprising two such sites. A single such site is insufficient for binding of either the dimer or the monomer. The sites are adjacent, and generally have identical or similar defining sequences, but on complementary strands of the DNA. The two similar DNA defining sequences on the complementary strands thus have opposite orientation in space. The binding locus or site defined by the adjacent DNA sequences accordingly has two-fold rotational symmetry.
It might in principle be possible to use such dimers to engineer artificial regulation of gene expression but this would be subject to disadvantages. Such symmetrical DNA sequences (which may, for example, be twenty to thirty base-pairs long) are relatively rare in nature. It may be awkward or inconvenient to insert such sequences into the genome of the organism. Moreover, control of gene expression will be limited. It will be dependent on only one factor, namely the concentration in the cell of the protein monomer. The use of bacterial repressor homodimers to regulate expression of artificially engineered genes introduced into yeast, mammalian and invertebrate cells has already been demonstrated, Brent, R. and Ptashne, M. (1984) Nature, 312, 612-615, Smith, G. M. et al, (1988), EMBO J., 7, 3975-3982; Hu, M. C.-T. and Davidson, N. (1987), Cell, 48, 555-566; Brown, M. et al, (1987), Cell, 49, 603-612; Hu, M. C.-T and Davidson, N. (1988), Gene, 62, 301-313. In these situations the use of heterodimers for control would have advantages due to the control which can be exerted by variation in the levels of production of either protein monomer (see later). In these cases the control of gene expression by the homodimer or the heterodimer is dependent upon the introduction of the appropriate binding site(s) upstream of the target gene. There are situations where this is not currently feasible for technical reasons (e.g. in hexaploid plants where six gene copies need to be altered in the same cells) or for ethical reasons (e.g. in the genomes of humans) and in these situations, control by heterodimers is a more attractive and realistic alternative. Since the heterodimers do not require dyad symmetry, any stretch of the appropriate number of bases of double-stranded DNA can be regarded as a binding site for a heterodimer with sites for the individual monomers being adjacent and on complementary strands of the double-stranded DNA. Thus by appropriate choice of binding specificities, the appropriate heterodimer can be directed, in principle, to any specific DNA sequence in the controlling regions of the target genes, without modifying the gene itself. For example, the sequence of the human corticotropin releasing factor gene contains the DNA sequence 5'ATTCAAGAATTTTGT3' at position 49 (in the published squence; ref. HUMCRF in the Genbank Database) which is a binding site for the heterodimer described in detail in this Specification.