With the increase in the number of species that have been determined of their genome sequences, so called genome comparison has extensively been performed. Genome comparison aims at finding something based on gene difference among species, for example, finding genes involved in evolution, finding a collection of genes which are considered to be common to all species, or conversely studying the nature unique to specific species.
The recent development of infrastructures such as DNA chips and DNA microarrays has changed the interest in the art of molecular biology from information of interspecies to information of intraspecies, namely coexpression analysis, and broadened the study covering from extraction of information to correlation of information, including the conventional comparison between species.
For example, if an unknown gene has an expression pattern identical to that of a known gene, the unknown gene can be assumed to have a similar function to that of the known gene. Such functional meanings of genes and proteins are studied as function units or function groups. The interactions between the function units or function groups are also analyzed by correlating with known enzymatic reaction data or metabolism data, or more directly, by knocking out or overreacting a specific gene to eliminate or accelerate expression of the gene to study the direct and indirect influences on the gene expression patterns of the whole collection of genes.
One successful case in this art field is the expression analysis of yeast by the group of P. Brown from the Stanford University (Michel B. Eisen et al., Clustering analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. (1998), Dec 8; 95(25): 14863-8). They hybridized genes with a gene extracted from a cell in a time series using a DNA microarray and numerated the expression levels thereof (i.e., numerated the brightness of the hybridized fluorescent signals). By converting the values into colors, the expression pattern of each gene can be displayed in a visually apprehensible manner. At this point, genes that have a similar expression pattern during their gene cycles (genes having closer expression levels at the same point) are clustered together.
FIG. 24 is a diagram showing an example of displaying expression status 2400 of genes according to the above-described method, where the horizontal and vertical axes indicate time and genes, respectively. In this display, genes belonging to a common cluster may be considered to have common functional characteristics. In FIG. 24, each of the blocks 2401 represents expression status of a gene at one time point. In the figure, the expression status is schematically represented in a gray scale format.
FIG. 25 is a diagram showing an example of displaying expression status 2500 of genes according to the above-described method, where the horizontal and vertical axes indicate experiment cases and genes, respectively. A dendrogram shown on the left is made by stepwisely joining every two most similar clusters together. The length of each branch corresponds to the distance between the two joined clusters. In FIG. 25, each of the blocks 2501 represents an expression status of a gene at one time point. In the figure, the expression status is schematically represented in a gray scale format.
The above-described displaying method allows a supposition that genes belonging to the same cluster may possibly share common functional characteristics.
With the gene expression patterns, however, it is not so simple as to elucidate the relationship among all of the genes in a cell by finding some gene groups having similar expression patterns for the entire cell cycle.
For example, different genes may exhibit similar expression for having similar function at a certain time point. However, they may have different roles at other time point, at which point, of course, the expressions are different. According to the conventional method in which similar expression patterns are clustered together over the entire cell cycle, these genes are classified into different clusters. Therefore, it is difficult to find the above-mentioned characteristics.
In an actual analysis of gene expression patterns, enormous amount of data will be subjected to clustering as shown in FIG. 25. The number of genes is several thousands to ten-thousands, or more than hundred-thousands at maximum. The experiment cases (data) employed may be of any number, for example, in an order of about ten to tens or hundreds. Thus the dendrogram shown in FIG. 25 will be very complicated, containing vast numbers of small branches.
FIG. 26 shows such a complicated case. The left part of FIG. 26 shows the entire results of clustering, targeting mass data of gene expression patterns. The right part of FIG. 26 surrounded by a dotted line 2601 shows the results in a particular region enclosed in a window determined by a user to actually see a narrowed part of the entire results in more detail.
The thus-obtained dendrogram 2602 represents the precise course of joining the most similar clusters. However, it is difficult for the user to find out how many clusters have briefly been classified by looking at this display to judge and guess the groupings of the genes.
It would be useful for the user if the system can suggest the possible cases of the number of the clustering groups so that the user can select the most suitable clustering level. Specifically, data are automatically calculated into groups for various levels of clustering (e.g., 7, 28, 105 and 372 clusters) so that the user may be able to study the grouping of the genes by selecting, from the menu of clustering levels, the suitable results of grouping closer to the desirable level of clustering.
The present invention has an objective of solving such conventional art problems by providing a method and an apparatus for effectively displaying gene expression patterns by finding different genes exhibiting similar expression for having the same function at one time point but having different roles at a different time point.
The present invention also has an objective of providing a method and an apparatus for displaying gene expression patterns by automatically extracting brief groupings of clusters from the results of clustering so that a user can select a desirable level of the grouping for more comprehensible display to study the groupings of the genes. In other words, the present invention has an objective of providing a method and an apparatus for effectively displaying gene expression patterns by providing multiple selectable clustering levels.