1. Field of the Invention
The present invention relates to a vector-based method for visualizing secondary structures of RNA molecules. More particularly, the present invention is concerned with an improvement in producing overlap-free polygonal displays of secondary structures with minimal distortion to structural elements, with minimal search for positioning them and with minimal user intervention.
2. Description of the Prior Art
In order to better understand the background of the invention, the basic concept and technical terminologies used herein will be illustrated with reference to FIG. 1 which shows the structural elements of an RNA molecule.
A structural element refers to either a double-stranded part (i.e., helix) or a single-stranded part such as an internal loop, bulge loop, multiple loop, or dangling end, as shown in FIG. 1. A structural element consists of one or more structural units, each of which is a contiguous segment of a base sequence. The double-stranded part, called helix or stem, is formed by the existence of two or more contiguous base pairs in an RNA molecule. The internal loop is a protruded part as a result of the failure of pairing bases in both strands while the bulge loop is a protruded part which results from the failure of pairing bases in one strand. The multiple loop is referred to a stretch or stretches of unpaired bases through which two or more helices are jointed. As for the dangling end, it is an unpaired part at the start or end of the base sequence.
Adjacent helices to a loop xcexd mean helices directly connected to xcexd. Adjacent loops to a loop xcexd include all loops connected to xcexd via a single helix. A seed loop of a loop xcexd is an adjacent loop to xcexd, which has already been positioned. A regular secondary structure is one having no bulge loop, dangling end, or helices directly adjacent to each other.
There are several representation methods for RNA secondary structure, including polygonal display, mountain, and circles and domes. They are exemplified by the drawings of FIGS. 2b to 2e with respect to the secondary structure of FIG. 2a. 
In essence, the secondary structure of RNA is a topological structure, which depends utterly on the connectivity relation of the constituting bases, rather than a geometric structure. One of the aims of representing the structure in graphical forms is to facilitate the comparison and evaluation of RNA secondary structure by sight. It is virtually impossible to evaluate the secondary structure of an RNA molecule which consists of a large number of bases, unless it is properly visualized. Since evaluating and comparing an RNA secondary structure is accomplished by validating the connectivity relation of the bases, it is a useful representation method by which a clear and compact graphic form free of structural element overlap, is produced. For intuitional recognition of the whole topology of the second structure, the graphic form produced is required not to be under distortion (e.g., bending, contorting or resizing of structural elements) as best as possible.
Most drawing programs of RNA secondary structures first produce graphical forms with overlapping structural elements, and then remove the overlap by deforming (bending, contorting and/or resizing) the structural elements with user intervention (Devereux, J., Haeberli, P., and Smithies, O. (1984) A comprehensive set of sequence analysis programs for the Vax. Nucleic Acids Res., 12, 387-395; Shapiro, B. A., Maizel, J., Lipkin, L. E., Currey, K., and Whitney, C. (1984) Generating non-overlapping displays of nucleic acid secondary structure. Nucleic Acids Res., 12, 75-88) or by an iterative process or backtracking of programs (Bruccoleri, R., E., and Heinrich, G. (1988) An improved algorithm for nucleic acid secondary structure display. CABIOS, 4, 167-173; Lapalme, G., Cedergren, R. J., and Sankoff, D. (1982) An algorithm for the display of nucleic acid secondary structure. Nucleic Acids Res., 10, 8351-8356; Stxc3xcber, K. (1985) Visualization of nucleic acid sequence structural information. CABIOS, 1, 35-42; Muller, G., Muller, G., Gaspin, C. Etienne, A., and Westhof, E. (1993) Automatic display of RNA secondary structures. CABIOS, 9, 551-561; Perochon-Dorisse, J., Chetouani, F., Aurel, S., Iscolo, N., and Michot, B. (1995) RNA_d2: a computer program for editing and display of RNA secondary structure. CABIOS, 11, 101-109).
Where the overlap of structural elements is removed with programs, the elements are deformed according to rules. The deforming rules introduced, however, are applied indiscriminately to all structural elements, so the resulting secondary structures are likely to be distorted (e.g., particular structural elements are too bent or contorted).
In addition, since the visualizing programs require high computational power, they often run on a mainframe or workstation level computer, which is not easily available to RNA researchers.
A recent algorithm (Nakaya et al., (1996) Visualization of RNA secondary structures using highly parallel computers. CABIOS, 12, 205-211) generates a polygonal display in O(NlogN) time by applying an O(NlogN) force-calculation algorithm, originally developed by Barnes and Hut (1986, A hierarchical O(NlogN) force-calculation algorithm. Nature, 324, 446-449). However, their algorithm has been implemented disadvantageously using a parallel programming language on a parallel computer.
In brief, many methods for visualizing the secondary structures of RNA molecules have been reported, inclusive of, for example,
Chetouani, F., Monestixc3xa9, P., Thxc3xa9bault, P., Gaspin, C., and Michot, B. (1997) ESSA: an integrated and interactive computer tool for analyzing RNA secondary structure. Nucleic Acids Res., 25, 3514-3522;
Hogeweg, P. and Hesper, B. (1984) Energy directed folding of RNA sequences. Nucleic Acids Res., 12, 67-74;
Matzura, O. and Wennborg, A. (1996) RNA draw: an integrated program for RNA secondary structure calculation and analysis under 32-bit Microsoft Windows. CABIOS, 12, 247-249;
Nussinov, R. Pieczenik, R., Griggs, G. and Kleitman, J. (1978) Algorithms for loop matching. SIAM J. Appl. Math., 35, 68-82; and
Osterburg, G. and Sommer, R. (1981) Computer support of DNA sequence analysis. Comput. Progr. Biomed., 13, 101-109.
these are found to show at least one of the following disadvantages: full automation is not settled in the visualizing process, so there is much room for user intervention; structural elements are frequently deformed, which makes it difficult to recognize the overall topology; high performance computers like parallel computers are needed; and an exponential time for automatically producing an overlap-free display is taken due to backtracking.
Therefore, to avoid the above problems, active research has been and continues to be directed to the development of methods for visualizing secondary structures of RNA molecules, by which clear and compact graphic products can be obtained fast and at a low cost.
The objective of the present invention is to overcome the above problems encountered in prior arts and to provide a method for visualizing secondary structures of RNA molecules, by which the structures can be drawn as a polygonal display.
Another objective of the present invention is to provide the visualizing method by which the polygonal display can be produced with minimal overlap.
It is a further objective of the present invention to provide the visualizing method in which distortion level to avoid overlap of structural elements is kept as little as possible.
It is still a further objective of the present invention to provide the visualizing method with minimal user intervention.
It is still another objective of the present invention to provide the visualizing method which can be implemented in the Microsoft Window operating system on IBM compatible personal computers.
In accordance with the present invention, the above objectives could be accomplished by a provision of a method for visualizing an RNA secondary structure, which uses vector and vector space to determine the position of a structural element and which comprises the steps of regularizing a secondary structure, building data structures, determining positioning priority, and positioning and drawing structural elements.
In the step of regularizing a secondary structure, the secondary structure is transformed into a regular one by introducing artificial bases so that it does not contain any bulge loop, dangling end, or helices directly adjacent to each other. A regularized secondary structure is stored in a data structure called an organization object.
The building data structures step is composed of identifying structural elements from the organization object and constructing the data structures of the secondary structure object and the draw list object for each of the identified structural elements.
The positioning priority is determined by first computing the sizes of all loops and determining the positioning priorities of all structural elements, including helices. A data structure called a priority queue stores these priorities.
As for the step of positioning and drawing structural elements, it comprises computing open and allowed vector spaces and a feasible vector, starting from a structural element with the highest drawing priority. A structural element shall be positioned in the direction of the feasible vector. For each positioned structural element, the coordinates of its constituting bases are computed and they are displayed.