1. Field of the Invention
The present invention relates to a method of managing and displaying data, and in particular to a method of managing and displaying gene expression data obtained by gene expression analysis.
2. Background Art
DNA chips and DNA microarrays (hereinafter collectively referred to as “DNA chips”) are used as a means of comprehensively analyzing gene expression data. By using DNA chips, it is possible to decode the entire length of a genome sequence, such as a human genome sequence or a mouse genome sequence, although the construction and functioning of genes remain largely unknown.
The probes implemented in DNA chips are determined by information on a genome sequence, but due to cases where it is necessary to consider the influence of splice variants and cases where all of the genes existing in a single species are profiled, the number of probes mounted in DNA chips is steadily increasing.
Since the number of probes that can be implemented in a single DNA chip is limited, at present, experiments are normally carried out in batch units. By doing so, the probes are divided into batch units (the unit for one experiment), and are placed in different DNA chips. When conducting such experiments in batch units, different DNA chips are used for the same samples, so that variations in quality occur between the batch units, which results in the problem of there being overall variations in quality.
Normally, when experiments are carried out on a single sample using different DNA chips, gene expression data is normalized on a DNA chip basis, with it being necessary to determine whether variations in such data are caused by variations in quality, biological variations, or technical variations. The assessment and distinguishing of variations in experiment results are a major issue when interpreting gene expression data. When the variations between batch units are added to the various types of variations mentioned above, it becomes even more difficult to discover the causes of such variations.
Non-Patent Document 1 states that to solve this kind of problem, it is important to draw up an experiment design that sets independent experiment units.
FIG. 1 shows one example of an experiment shown in Non-Patent Document 1. In this experiment, the difference in effect of two treatments A and B on mice is investigated. As a sample, four mice 101 are prepared, with two of the mice being subjected to treatment A and the other two mice being subjected to treatment B. As duplicated specimens, two mRNA extractions are taken from each mouse and are marked with respectively different fluorescent dyes 102, 103. The specimens marked with the fluorescent dye 102 are set as Channel 1 (Ch1) and the specimens marked with the fluorescent dye 103 are set as Channel 2 (Ch2). The four specimens prepared in this way are labeled with the codes A1, A2, B1, B2. The codes of the specimens are respectively composed of a letter showing the treatment and a number of the duplicated specimen subjected to the same treatment. Four specimen pairs that respectively have different treatments and channels are taken from eight specimens, and hybridization is carried out on DNA chips 104.
On the other hand, when the difference in effect of the two treatments A and B on mice is investigated, two mice are provided, and one mouse is subjected to treatment A and the other mouse is subjected to treatment B. mRNA extractions are taken from each mouse, the mRNA extractions are dyed with fluorescent dyes of respectively different colors, and hybridization is carried out on DNA chips. In this method, the experiment data is biased due to bias in the selection of the mice and the properties of the two fluorescent dyes. Accordingly, the resulting experiment data cannot be said to be statistically valid.
In the example shown in FIG. 1, the duplicated samples extracted from the respective mice are subjected to a dye-swap experiment and hybridization is carried out for suitable combinations selected from the duplicated samples mRNA, so that there is a reduction in technical variations. That is, the statistical process disclosed in Non-Patent Document 1 is thought to be effective when analyzing data.
Next, a different example experiment to the example shown in FIG. 1 will be described with reference to FIG. 2. In this example experiment, the difference in effect of a single treatment A on two different cells is investigated. First, as samples, two each of two types of cells, “Normal” cells 201 and “Disease” cells 202, are prepared. All of these cells are subjected to the treatment A, and two mRNA extractions are taken as duplicated specimens from each cell. The specimens from the “Normal” cells 201 and the specimens from the “Disease” cells 202 are marked with respectively different fluorescent dyes 203, 204. The specimens marked with the fluorescent dye 203 are set as channel 1 (Ch1) and the specimens marked with the fluorescent dye 204 are set as channel 2 (Ch2). The four pairs of specimens prepared in this way are labeled with the codes N1, N2, D1, D2. The codes of the specimens are respectively composed of a letter showing the cell type and a number of the duplicated specimen of the same type. Four specimen pairs that respectively have different cells and channels are taken from eight specimens, and hybridization is carried out on DNA chips 205.
This method is often used when comparing control samples (for example, Normal cells) with analyzed samples (for example, Disease cells). This method does not carry out a dye-swap method, but is also used when assessing reproducibility through repeated experiments where the same samples and same chips are used.
To comprehensively carry out various types of experiments with consideration to all of the statistical processes disclosed in Non-Patent Document 1, it is necessary to use a large number of DNA chips in each experiment and there is also an increase in the number of experiments.
On the other hand, the information relating to respective experiments includes information such as experiment conditions, types of samples, numbers of independent samples, treatment methods for the samples, types of DNA chips used, and combinations of marking methods used for hybridization, as well as information such as information on probes mounted on the DNA chips (such as gene information obtained from public databases), the date and times of the experiments, and information on the persons conducting the experiments. To efficiently analyze gene expression data including such a large amount of information, an efficient gene expression data managing system is required.
[Non-Patent Document 1]
Churchill, G. A. (2002) “Fundamentals of Experimental Design for cDNA Microarrays” Nat. Genet. 32 Suppl, p 490-495.
With a conventional gene expression data analyzing system, it is possible to input detailed information on the experiment conditions and the samples as information aside from the gene expression data, but such information is expressed in a tree format or a list format. In a system that displays information in a tree format, data that has been analyzed is managed as a folder and the information is managed without considering information on the experiment conditions or the target specimens. In a system that displays information in list format, individual sets of hybridization information are displayed, but the data is not managed with consideration to experiment conditions in various experiment designs.
For gene expression data obtained from experiments on batch units, it is ultimately necessary to assess the expression values after putting the values on the same basis for analysis as data for the same species.
However, in a conventional gene expression data analyzing system, for experiments conducted in batch units, replicated experiments are displayed through hybridization using duplicated specimens, and display is not carried out with consideration to the dye-swap method for the target.
In a conventional gene expression data analyzing system, it is not possible to additionally display the chip orientation. Accordingly, researchers have to merge gene expression data obtained from different experiments using the same specimens beforehand and to newly generate folders or lists using a conventional gene expression data analyzing system.
When different experiment designs have been drawn up by researchers, a large amount of gene expression data is generated using many DNA chips. However, it has not been possible to visually and understandably display a data management display for replicated experiments and dye-swap experiments that consider hybridization combinations. Accordingly, it has not been possible to smoothly carry out operations such as the verification, pre-processing, and analysis of such data.