It is known that the intergenic and intronic regions comprise most of the genomic sequence of higher organisms. The intergenic and intronic regions are collectively referred to as the “non-coding region” of an organism's genome, as opposed to the “gene-” or “protein-coding region” of the genome. Even though recent work suggested participation of the intergenic and intronic regions in a regulatory role, for the most part, their true function remains elusive. The search for conserved motifs, presumed to be regulatory and control signals, in regions upstream of the 5′ untranslated regions (5′UTRs) of genes, has been the focus of research activities for many years.
More recently, researchers began studying the 3′ untranslated regions (3′UTRs) of genes where they discovered conserved regions and showed them to be functionally significant, in direct analogy to the cis-motifs of promoter regions. Large-scale comparative analyses allowed researchers to also study conservation in the vicinity of genes and elsewhere in the genome with great success. However, these studies were carried out on only a handful of organisms at a time because of the magnitude of the necessary computations.
The analysis of 3′UTRs intensified further after it was discovered that they contain binding sites that are targeted by short interfering ribonucleic acids (RNAs) that induce the post-transcriptional control of the corresponding gene's expression through either messenger RNA (mRNA) degradation or translational inhibition. Accumulating evidence that non-coding RNAs control developmental and physiological processes and that a considerable part of the human genome is transcribed, has helped researchers identify “functional” elements in areas of the genome that are not associated with protein-coding regions.
Thus, techniques for efficiently and effectively identifying and associating non-coding regions with gene coding regions of a genome would be desirable.