In eucaryotic species, protein synthesis is accomplished by a dual-step process in which the cell's genetic sequence is first transcribed into a messenger RNA sequence, which in turn is translated into a specific peptide. The first step of this process, wherein RNA is synthesized on a DNA template, is known as transcription, and results in the synthesis of the mRNAs that carry the information for protein synthesis, as well as the transfer, ribosomal and other RNA molecules that have structural and catalytic functions within the cell. These RNA molecules are synthesized by one or more enzymes, RNA polymerases, which make RNA copies of the DNA sequence. However, before this synthesis can begin, the free RNA polymerase molecules must come into contact and bind with very specific DNA sequences, called the promoter, that contains the start site for RNA synthesis and signals where RNA synthesis should begin.
Although the process of transcription is similar in eucaryotes and procaryotes, the process is more complex in eucaryotic species. For example, whereas the procaryotic RNA polymerases can bind directly to their promoter, the eucaryotic enzymes can bind to their promoters only in the presence of additional protein factors already on the DNA. Thus, one or more sequence-specific DNA-binding proteins must be bound to the DNA to form a functional promoter. These protein factors are called transcription factors and are necessary for the initiation of RNA synthesis.
The proliferation and differentiation of eucaryotic cells is a complex phenomena initiated by a vast number of extracellular signals. These may include, for example, various soluble factors such as pharmaceuticals, toxins or extracellular and intracellular byproducts; matrix proteins; and adhesion molecules. These signals are passed intracellularly through a variety of signal transduction processes and lead to the activation of a set of early response genes, some of which encode transcription factors thereby leading to the initiation of a cascade of gene-protein interactions and ultimately to long-term alterations in gene expression.
The importance of transcription factor cascades in cell proliferation and differentiation is a major focus of study in modern biology. For instance, studies of transcriptional control mechanisms underlying spatially restricted transcription in the early embryonic development of Drosophila see A. Kane, Development 101:1 (1987)!, lineage specification in muscle see B. Buckingham, Curr. Opin. in Genetics and Dev. 4:745 (1994)! and nerve cells see Lee et. al., Science 268:836 (1995)!, in mammary cell differentiation see Despraz et al., Mol. Cell Biol. 15:3398 (1995)!, and many other examples illustrate that gene protein interactions initiated by key transcription factors bring about long term changes in gene expression. These changes often involve the activation of other transcription factors which control downstream processes. Cellular differentiation and proliferation can, therefore, be regarded as an integrated process involving the concerted and sequential action of transcription factors that determine the specific biology of the cell type.
A typical cell in humans expresses about 10,000 genes. By extrapolation from lower organisms, about 2-20% of these genes are believed to encode transcription factors. Each cell type could, therefore, express as many as 200-2000 different transcription factors. Identifying transcription factors has been a time-consuming and tedious task that often requires large amounts of biological material, or depends upon the availability of mutants, and in general, has yielded one new factor per investigation. There is a need, therefore, to provide alternative means to identify and study the transcription factors per se, as well as allow investigation into their biological function.
To overcome the problems with conventional methods, a direct approach is provided herein for the comprehensive isolation of binding sites recognized by a large cross-section of the transcription factors present in any cell type based upon their property to bind DNA in a sequence-specific manner. The potential applications of this approach in preparing cell type specific binding site sub-libraries is also described in this specification.
In present invention, a number of sequence-specific DNA binding properties of transcription factors were exploited in an approach to obtain the direct isolation of binding sites recognized by a large number of protein factors. These binding sites serve as efficient probes for isolation of the cognate factors. The approach is rapid, and can be reiterated to derive progressively more information from the products of each study.