1. The Field of the Invention
The present invention relates to self-hybridizing nucleic acid elements which may regulate a number of molecular genetic events. More specifically, the present invention relates to methods of identifying and locating resonating, self-hybridizing nucleic acid elements in a rapid and efficient manner.
2. The Relevant Technology
Proteins are substances which play many roles in living organisms, including regulatory functions. Proteins are polymers made up of subunits called amino acids. There are twenty amino acids which, depending on their arrangement, dictate a protein's biological activity.
In the nucleus and mitochondria of a cell, molecules of deoxyribonucleic acid ("DNA") carry all the information required to make all the proteins required by the body. DNA is a polymer of smaller molecules called nucleotides or bases. There are four nucleotides which comprise DNA--adenine (A), cytosine (C), guanine (G), and thymidine (T). Generally, DNA exists as two polymers or strands (double-stranded) held together by interstrand hydrogen bonds between nucleotides on the two strands. Adenine pairs with thymidine to form hydrogen bonds, and cytosine pairs with guanine. This type of bonding is referred to as Watson and Crick base pairing.
In certain circumstances, the nucleotides in a single molecule of DNA will pair with other nucleotides in the same molecule to form intrastrand bonds. For example, when a DNA sequence contains inverted repeated sequences (known as "palindromes"), hydrogen-bonded hairpin structures may form. As is well known in the art, these structures may contain "loops" of nucleotides that do not participate in the intrastrand bonding. Stem and loop structures are known to function in regulating a number of molecular genetic events by, e.g., interacting with DNA binding regulatory proteins.
While a DNA molecule may contain all the necessary information to make a given protein, DNA does not serve as a direct template for making proteins. Instead, in a process called transcription, DNA serves as a template for the synthesis of a structurally-related intermediate molecule called ribonucleic acid ("RNA"). RNA is essentially an imprint of DNA. RNA and DNA, however, differ in three major ways: (1) the sugar backbone of RNA is ribose rather than deoxyribose; (2) RNA exists as a single polymer (single stranded); and (3) the nucleotide thymidine found in DNA is replaced by the nucleotide uridine (U).
There are several subclasses of RNA molecules: messenger RNA, transfer RNA, and ribosomal RNA. Messenger RNA ("mRNA"), is responsible for delivering the information coded by the DNA in the nucleus to the cell's protein machinery in the cytoplasm. In a process called translation, cells synthesize proteins using mRNA as a template (i.e., cells "translate" the nucleic acid language into amino acid language). Three nucleotides (a "codon") within the mRNA define, or code, for a specific amino acid. The codons that code for specific amino acids are known and referred to as the genetic code.
Over the years, scientists have discovered that RNA participates in other biological processes. To perform these biological processes, single stranded RNA molecules typically fold to form secondary or tertiary structures. The secondary and tertiary structures are usually maintained by conventional Watson and Crick base pairing between nucleotides found throughout the single stranded RNA. Increasingly, however, scientists are discovering that RNA tertiary structure is often held together by non-conventional base pairing (e.g., G-U) and base triples (e.g., U-A-U).
The tertiary structure of transfer RNA (tRNA) was one of the first to be studied. Scientist have determined the nucleotide sequence of several hundred tRNA molecules isolated from numerous organisms. The tRNA molecules are all between 73 and 93 nucleotides. While the exact nucleotide sequences vary, certain nucleotides are conserved from one tRNA to another. These conserved nucleotides confer a three-dimensional L-shaped structure, which is often depicted as a two-dimensional cloverleaf-like structure. Most of the base pairs and base triples that form the tertiary structure are conventional.
Tertiary structures have also been observed for self-splicing RNA. Studies in Tetrahymena thermophilia, for example, reveal that intron splicing of ribosomal RNA ("rRNA") does not require proteins, but rather is intrinsic to the rRNA molecule. In an elaborate set of reactions, a 413 nucleotide intron is excised in a series of phosphoester transfer reactions. The predicted tertiary structure of the Tetrahymena rRNA brings the 5' and 3' ends of the intron into close proximity. The tertiary structure is confirmed by experiments in which small deletions in the intron remote from the splice junction completely prevent splicing.
RNA tertiary structure may also be found in the coding regions of genes. Retroviruses use translational suppression to express the pol and gag genes as one large fusion protein. The pol genes encode enzymatic proteins integrase, protease, and reverse transcriptase, while the gag genes encode structural proteins. Greater quantities of the structural gag gene products are required. To produce the enzymatic and structural proteins in different amounts, some retroviruses use ribosomal frameshifts and read-through. Thus, a single fusion protein is synthesized from two or more overlapping genes by altering the reading frame. Recently, it has been discovered that ribosomal frameshifting requires a heptanucleotide sequence that comprises the actual frameshift site, and an RNA tertiary structure downstream of the frameshift site. Specifically, the tertiary structure comprises a psuedoknot. A psuedoknot is formed when a region outside a hairpin loop base pairs with a complementary sequence on the bulge of the hairpin loop. Studies suggest that frame shifting is psuedoknot dependent.
It will be appreciated that the function and regulation of a nucleic acid frequently depends on the formation of intrastrand structures. As more nucleic acid sequence information is accumulated, the ability to detect such structures becomes increasingly important. Existing methods for identifying such structures, however, are limited. While some existing methods search for palidromic sequences and stem-loop structures, these methods do not search for permutations of such structures.
It will also be appreciated that in some cases disrupting the secondary and tertiary structure of a nucleic acid may be desirable. For example, many virus are harmful to humans and are responsible for numerous diseases. The ability to identify such structures may lead to faster development of antiviral therapies.