Gene expression is a major determinant of a cell's phenotype, and in turn, DNA-protein interactions largely determine the patterns of gene expression. Modification of gene expression so as to affect phenotype is a major aim of biology and medicine, be it in industrial microbes, experimental models, pathogens or to tackle human disease. In this context the cis-regulatory sequences within the genome, the DNA component of the transcriptional machinery, are attractive targets for intervention. In comparison to tackling the proteins, working with DNA is far easier: sequencing is highly automated and relatively inexpensive, and DNA can be easily manipulated and readily amplified or synthesized. DNA-based therapies have also emerged as an exciting new class of therapeutic agents. In comparison to traditional pharmaceuticals, natural products or small molecules, DNA is an attractive type of therapeutic agent as it can be:                designed by a rational process, most simply by examination of sequence data;        cheaply manufactured at scale, by chemical synthesis of oligonucleotides or biological replication;        predicted to have low toxicity, as DNA is a ‘natural’ compound and does not, in and of itself, typically induce immunogenic responses, and specificity can be controlled by sequence of the DNA-based therapy;        greatly reduce R&D expenditure, as all stages of conventional drug development (target identification, lead compound discovery, medicinal chemistry) are truncated.        
The challenge then becomes how to identify the key cis-regulatory elements. The technologies and expertise that have developed in parallel with the genome sequencing projects, such as massively parallel gene expression analysis using DNA microarrays and the use of bioinformatics to annotate the genome databases, are not sufficient to either identify all of the cis-regulatory elements or ascribe function to those that are known. In 2003, the National Institutes of Health in the US launched the ENCODE project to catalogue 1% of the cis-regulatory elements in the human genome (Science (2004) 306: 636-640; Nature (2007) 447: 799-816) and to develop high-throughput technologies as discovery platforms. The procedures developed included use of chromatin immunoprecipitation, probing hypersensitivity sensitivity to in vivo digestion by DNaseI (with DNA microarrays, high-throughput quantitative PCR and genomic libraries) and the development of new algorithms for bioinformatical detection. While these techniques have the potential to greatly accelerate the rate of discovery of cis-regulatory elements, they will not necessarily lead to their functional characterization: the output of the project is a comprehensive catalogue of these elements. Even though, from such work, it is likely that the tissue-specificity of the majority of elements will be known, and their distance from a gene and classification according to what type of trans-acting factor binds to them, this will not be sufficient to determine what the biological function of the element actually is. A further drawback common to all of these procedures is that they rely on the genome of the organism being sequenced. Furthermore, the approach using hypersensitivity to DNaseI digestion has the extra disadvantage that it is specific for eukaryotic cells, as, in this context, DNaseI is a probe of chromatin structure, and no comparable structure exists in prokaryotes.
Further, as yet there is no means for rapidly screening a large number of sequences for potential cis-regulatory sequences.