Many proteins involved in regulating genome expression, chromosomal replication and cellular proliferation function through their ability to bind specific sites in the genome. Transcriptional activators, for example, bind to specific promoter sequences and recruit chromatin modifying complexes and the transcription apparatus to initiate RNA synthesis. The remodeling of gene expression that occurs as cells move through the cell cycle, or when cells sense changes in their environment, is effected in part by changes in the DNA-binding status of transcriptional activators. Distinct DNA-binding proteins are also associated with centromeres, telomeres, and origins of DNA replication, where they regulate chromosome replication and maintenance. Although considerable knowledge of many fundamental aspects of gene expression and DNA replication has been obtained from studies of DNA-binding proteins, an understanding of these proteins and their functions is limited by our knowledge of their binding sites in the genome.
Proteins which bind to a particular region of DNA can be detected using known methods. However, a need exists for a method which allows examination of the binding of proteins to DNA across the entire genome of an organism.
The present invention relates to a method of identifying a region (one or more) of a genome of a cell to which a protein of interest binds. In the methods described herein, DNA binding protein of a cell is linked (e.g., covalently crosslinked) to genomic DNA of a cell. The genomic DNA to which the DNA binding protein is linked is identified and combined or contacted with DNA comprising a sequence complementary to genomic DNA of the cell (e.g., all or a portion of a cell""s genomic DNA such as one or more chromosome or chromosome region) under conditions in which hybridization between the identified genomic DNA and the sequence complementary to genomic DNA occurs. Region(s) of hybridization are region(s) of the genome of the cell to which the protein of interest binds. The methods of the present invention are preferably performed using living cells.
In one embodiment, proteins which bind DNA in a cell are crosslinked to the cellular DNA. The resulting mixture, which includes DNA bound by protein and DNA which is not bound by protein is subject to shearing conditions. As a result, DNA fragments of the genome crosslinked to DNA binding protein are generated and the DNA fragment (one or more) to which the protein of interest is bound is removed from the mixture. The resulting DNA fragment is then separated from the protein of interest and amplified, using known methods. The DNA fragment is combined with DNA comprising a sequence complementary to genomic DNA of the cell, under conditions in which hybridization between the DNA fragment and a region of the sequence complementary to genomic DNA occurs; and the region of the sequence complementary to genomic DNA to which the DNA fragment hybridizes is identified. The identified region (one or more) is a region of the genome of the cell, such as a selected chromosome or chromosomes, to which the protein of interest binds.
In a particular embodiment, the present invention relates to a method of identifying a region of a genome (such as a region of a chromosome) of a cell to which a protein of interest binds, wherein the DNA binding protein of the cell is crosslinked to genomic DNA of the cell using formaldehyde. DNA fragments of the crosslinked genome are generated and the DNA fragment to which the protein of interest is bound is removed or separated from the mixture, such as through immunoprecipitation using an antibody that specifically binds the protein of interest. This results in separation of the DNA-protein complex. The DNA fragment in the complex is separated from the protein of interest, for example, by subjecting the complex to conditions which reverse the crosslinks. The separated DNA fragment is amplified using ligation-mediated polymerase chain reaction (LM-PCR), and then fluorescently labeled. The labeled DNA fragment is contacted with a DNA microarray comprising a sequence complementary to genomic DNA of the cell, under conditions in which hybridization between the DNA fragment and a region of the sequence complementary to genomic DNA occurs. The region of the sequence complementary to genomic DNA to which the DNA fragment hybridizes is identified by measuring fluorescence intensity, and the fluorescence intensity of the region of the sequence complementary to genomic DNA to which the DNA fragment hybridizes is compared to the fluorescence intensity of a control. Fluorescence intensity in a region of the sequence complementary to genomic DNA which is greater than the fluorescence intensity of the control in that region of the sequence complementary to genomic DNA marks the region of the genome in the cell to which the protein of interest binds.
Also encompassed by the present invention is a method of determining a function of a protein of interest which binds to the genomic DNA of a cell. In this method, DNA binding protein of the cell is crosslinked to the genomic DNA of the cell. DNA fragments of the genome crosslinked to DNA binding protein are then generated, as described above, and the DNA fragment (one or more) to which the protein of interest is bound is removed from the mixture. The resulting DNA fragment is then separated from the protein of interest and amplified. The DNA fragment is combined with DNA comprising a sequence complementary to genomic DNA of the cell, under conditions in which hybridization between the DNA fragment and a region of the sequence complementary to genomic DNA occurs; and the region of the sequence complementary to genomic DNA to which the DNA fragment hybridizes is identified. This identified region is a region of the genome of the cell to which the protein of interest binds. The identified region is characterized and the characteristic of the identified region indicates the function of the protein of interest (e.g., a regulatory protein such as a transcription factor; an oncoprotein).
The present invention also relates to a method of determining whether a protein of interest which binds to genomic DNA of a cell functions as a transcription factor. In one embodiment, DNA binding protein of the cell is crosslinked to the genomic DNA of the cell. DNA fragments of the crosslinked genome are generated and the DNA fragment to which the protein of interest is bound is removed from the mixture. The resulting DNA fragment is separated from the protein of interest and amplified. The DNA fragment is combined with DNA comprising a sequence complementary to genomic DNA of the cell, under conditions in which hybridization between the DNA fragment and a region of the sequence complementary to genomic DNA occurs. The region of the sequence complementary to genomic DNA to which the DNA fragments hybridizes is identified; wherein if the region of the genome is a regulatory region, then the protein of interest is a transcription factor.
The methods described herein facilitate the dissection of the cells regulatory network of gene expression across the entire genome and aid in the identification of gene function.