The control of gene transcription in eukaryotes is largely controlled by myriad DNA-binding factors. Generally, such factors are required for initiation of transcription. Moreover, the presence of these factors may cause the activation of other transcription factors that control downstream processes and result in long term changes in gene expression. Thus, cellular differentiation and proliferation can be regarded as an integrated process involving the concerted and sequential action of transcription factors that determine the specific biology of the cell type.
Abnormal patterns of gene expression are the hallmark of several disorders and diseases. Such abnormal patterns may be the result of inappropriate expression, overexpression, or deficient expression of one or more genes. Deleterious consequences can flow from abnormal patterns. For example, overexpression of a growth factor can lead to a tumorigenic condition or a hyperproliferative disorder. In recognition of the importance of these processes, efforts to control gene expression is a major thrust of the biotechnology industry. Largely, these efforts focus on supplying extracellular signals that indirectly affect gene transcription or control post-transcriptional events. Where transcription factors that control particular genes have been identified, compounds are sought that directly affect transcription. However, this highly desirable approach to controlling gene expression has been hampered by the small number of transcription factors discovered.
A typical cell in humans expresses at least 10,000 genes. As many as 20% of these genes are believed to encode transcription factors. Each cell type could, therefore, express 2000 or more different transcription factors. However, only 10 to 20% of these factors have been identified. As well, only a fraction of the protein-binding sites have been identified. Several approaches for the selection of binding sites of individual DNA-binding proteins have been described (Irvine et al., J. Mol. Biol. 222:739, 1991; Blackwell and Weintraub, Science 250:1104, 1990; and Thieson and Bach, Nucleic Acids Res. 18:3203, 1990). These methods use single target macromolecules or nucleic acids for selection of totally or partially randomized oligonucleotide duplexes. The consensus DNA-binding sites for several proteins have been determined using these approaches (Wright and Funk, Trends Biochem. Sci. 18:77, 1993). Another approach, called CASTing (Funk et al., Proc. Natl. Acad Sci. USA 89:9484, 1992), has been used to identify transcription factors that bind adjacent to, and cooperate with, a specific factor for which an antibody is available. All these techniques suffer major disadvantages: the identification of binding sites or transcription factors is time-consuming, tedious, often requires large amounts of biological material, or depends upon the availability of mutants. In general, these techniques yield only one new factor or one new binding site at a time.
The present invention provides a direct approach for the simultaneous isolation and characterization of binding sites recognized by a large cross-section of the transcription factors present in any cell type. These binding sites subsequently facilitate isolation of transcription factors as well as providing other related advantages.