DNA carries biological information on a variety of levels. Some of the information is evident from well-defined sequence features such as the genetic code and certain reproducible transcription factor binding sites. Other (perhaps equally important) information in the genome may involve less precise sequence rules and longer-range interactions that are likely to be more difficult to detect and understand.
One approach to dissecting the information encoded in the genome is to search for non-random character in the DNA sequence. Protein coding constraints, for example, produce a non-random distribution in the utilization of base triplets. Likewise, certain transcription factors have a tendency to recognize multiple target sequences in a short interval, producing a localized increase in the incidence of one or more motifs. An intriguing means to investigate non-random features of DNA sequence involves searches for periodic appearance of specific sequence elements (see Trifonov, 1989; Mirsky 2004, which is herein incorporated by reference in its entirety). Previous analyses of this type have identified, among other non-random features, a strong tendency for 3n repeats in coding sequences (due to non-random usage of the genetic code and of amino acid sequences) and a periodicity of 10-11 bp in occurrence of AA/TT dinucleotides. The latter periodicity has been observed rather strikingly in sequences that display intrinsic curvature in vitro (e.g., Koo et al., 1986; Ulanovsky et al., 1987; Goodsell and Dickerson, 1994, which are herein incorporated by reference in their entireties). AA/TT dinucleotides are also unusual in having less flexibility under certain circumstances than other dinucleotide pairs (e.g., Nelson et al., 1987, which is herein incorporated by reference in its entirety) and in their apparent ability to contribute to the later positioning of nucleosomes along DNA (e.g., Satchwell et al., 1986, which is herein incorporated by reference in its entirety).
Although the bulk genome periodicity analysis has substantial power to detect patterns in the sequence, the biological significance of these patterns has remained somewhat of a mystery, in particular, due to challenges in identifying and finding functional consequences corresponding to individual sequence characteristics. From this perspective, well characterized model systems such as C. elegans provide a tool of considerable value. C. elegans has been reported to exhibit a strong 10.n base periodicity signal (e.g., VanWye et al., 1991; Widom, 1996; Fukushimna et al., 2002, which are herein incorporated by reference in their entireties) and is among the most extensively characterized both in structure (a complete sequence) and function (both individual genetic studies and whole genome expression and phenotypic analysis).
Notwithstanding extensive study of sequence patterns and their possible effect on biological functions, there remains a substantial area for exploration to determine the implications of sequence patterns and functional activity. The present invention is based on such a study using C. elegans as a model system applicable to eukaryotes in general.