1. Field of the Invention
The present invention relates to a method of visualizing protein interaction data into a three-dimensional graph.
2. Description of the Prior Art
With recent developments in proteomics, protein-protein interaction data are rapidly increasing in quantity. Because of being large-scale, the data can be more easily understood when being expressed into graphs rather than a long list of interacting proteins. In this regard, active research on methods to visualize protein interaction networks has been conducted. However, it is not easy to visualize protein interaction data, for the following reasons: first, the data yields a complex non-planar graph with a large number of edge crossings; and second, when visualized as a graph, the data often yields a disconnected graph comprising many connected components.
Most graph-drawing tools use modified force-directed layout algorithms which have flexibility, are easily implemented, and produce good drawing results. The conventional force-directed layout algorithms first place nodes randomly, and then rearrange their positions through optimization methods to find a layout with minimum energy. Force-directed layout algorithms differ mainly in selecting energy function and minimization methods. Examples of force-directed layout algorithms include algorithms by Kamada & Kawai (1989) and Fruchterman & Reingold (1991). The algorithm by Kamada & Kawai produces a two-dimensional graph and cannot produce a disconnected graph. A large number of force-directed algorithms share a common problem of being too slow in treating large-scale graphs because of computing a force between every pair of nodes at each iteration step.
Based on a relaxation algorithm, a java applet program was developed by Mrowka (2001) for visualization of protein interactions, and was tested on yeast two-hybrid (Y2H) data (Uetz et al., 2000). However, this program has several disadvantages as follows. The program requires all protein interaction data to be provided as parameters of the applet program in html sources. There is no way to save a visualized graph except by capturing the window. Also, images captured from the window are static and typically of low quality, and cannot be refined or changed later to reflect an update in data. Further, a user can move a node, but cannot select or save a connected component containing a specific protein for further use.
Some visualization tools of protein interactions use general-purpose drawing tools, instead of their own algorithms or programs developed for visualization of protein interactions. For example, PSIMAP (Park et al., 2001; and Lappe et al., 2001) displays interactions between protein families by comparing Y2H data with DIP data using structural classification of proteins (Murzin et al., 1995). It was produced by Tom Sawyer software (www.tomsawyer.com) and then refined through manual work to remove edge crossings.
A research group at the University of Washington (Schwikowski et al., 2000; and Tucker et al., 2001) tried to visualize Y2H data using AGD (www.mpi-sb.mpg.de/AGD/), which is another general-purpose drawing tool. Because of being a general-purpose drawing tool, despite being powerful, AGD does not provide a function required for studying protein-protein interactions. For example, most protein interaction data including Y2H data yields a disconnected graph consisting of many connected components, which is also a non-planar graph with a large number of edge crossings not removable in a two-dimensional drawing. The graph can be analyzed by working on individual connected components or subgraphs containing a specific protein. Alternatively, the non-planar graph can be visualized into a three-dimensional graph with no edge crossings. However, because AGD doesn't provide these functionalities, it is difficult to analyze the graph.
The graph-drawing programs are problematic in terms of not querying a database and directly visualizing query results from the database. Since they require input data in a specific format for visualization, a user has to convert data into the format. In addition, protein interaction data is generally updated with the passage of time, but the conventional programs cannot reflect the updated data on the visualization.
The conventional graph-drawing tools have problems in visualizing protein interactions as follows. They draw a complex graph with a large number of edge crossings or a static graph difficult to revise. Also, they are too slow in performing interactive work with a large volume of data. Further, they can visualize protein interaction data only when the data is input in a specific format, because of not being capable of directly reading data from a protein interaction database.