Genome sequences contain information necessary to control gene expression programs and specify protein and other gene products. DNA-binding transcriptional regulators interpret the genome's regulatory code by binding to specific sequences to induce or repress gene expression (Jacob et al. J Mol Biol 3, 318-56 (1961), Kellis et al. Nature 423, 241-54 (2003), Cliften et al. Science 301, 71-6 (2003)). Substantial portions of genome sequence are believed to be regulatory (Pritsker et al. Genome Res 14, 99-108 (2004); Wang, et al. Bioinformatics 19, 2369-80 (2003); Blanchette et al. Nucleic Acids Res 31, 3840-2 (2003); Iyer et al. Nature 409, 533-8. (2001); Ren et al. Science 290, 2306-9. (2000)), but the DNA sequences that actually contribute to the regulatory code are ill-defined. In contrast, the triplet code used to translate nucleotide sequences into protein molecules is well known (Lee et al. Science 298, 799-804. (2002), Lieb et al. Nat Genet. 28, 327-34 (2001), Roth et al. Nat Biotechnol 16, 939-45. (1998)). Knowledge of the genome's transcriptional regulatory code could provide new insights into the principles that govern global gene regulation.
Comparative genomics has recently been used to identify functional sequence elements in the yeast genome (Pritsker et al. Genome Res 14, 99-108 (2004), Wang, et al. Bioinformatics 19, 2369-80 (2003), Liu et al. Nat Biotechnol 20, 835-9 (2002), Bailey et al. Proc Int Conf Intell Syst Mol Biol 3, 21-9 (1995)). Comparative analysis of the genome sequences of multiple yeast species revealed phylogenetically-conserved sequences, and these sequences were used to facilitate identification of genes and putative regulatory elements. Conserved sequence information alone does not reveal, however, the subset of sequences that are bound by transcriptional regulators, the identity of the binding regulators, or the conditions under which the regulators occupy their binding sites.
Therefore, there is a need to develop novel methods and algorithms for identifying the biologically-active DNA-binding site bound by transcriptional regulators in vivo.