This application claims priority to Japanese Application Serial No. 265933/2000, filed Sep. 1, 2000.
1. Field of the Invention
The present invention relates to display and evaluation of gene expression data that are obtained by hybridizing genes to a particular gene with known identity. The present invention also relates to a method for displaying and evaluating failures, or errors, occurring in experimental processes for obtaining such data in a manner that is visually easy to interpret.
2. Description of the Related Art
As the number of biological species increases whose genome have been sequenced, genome comparison analyses have become widely used to find genes that evidence evolution of species and search for gene populations that are common among different species. Gene comparison is also employed to find any clues from the differences between species to identify characteristics specific to a particular species.
Due to the recent developments of technological infrastructures such as biochips or DNA chips (which are referred to as xe2x80x9cbiochips,xe2x80x9d hereinafter), the subject of interest in molecular biology have been shifting from interspecific information to intraspecific information, namely, simultaneous expression analyses. This type of information, together with conventional interspecific comparisons, widens the possibility of the art from merely extracting information to associating pieces of the information with each other.
For example, if an unknown gene is found to have an expression pattern identical to that of a known gene, it is inferred that the unknown gene has a similar function to the known gene. Functions of these genes and the resulting proteins are studied by considering them as a functional unit or group. Further, how genes or proteins interact with each other is analyzed by associating them with the data for a known enzyme reaction or metabolism, or more directly, by making a gene deficit to terminate the expression of the gene or by making the gene excessively active to permit the overexpression and studying direct or indirect influences of the gene on expression patterns of the entire genes.
In studies of gene expression patterns using biochips, elements that are associated with living tissue of interest are prepared. The term xe2x80x9celementsxe2x80x9d herein refers to fragments of any DNA that are related to the living tissue of interest. In a biochip, the elements are spotted and immobilized on a substrate such as a slide glass or a silicon wafer with a density of several hundred to several thousand elements per square centimeter. The term xe2x80x9csamplexe2x80x9d herein refers to fragments of any DNA or RNA that are extracted from living tissue of interest to be reacted with the elements on a biochip. When a gene is expressed in cells, DNA is transcribed into RNA. The RNA is extracted and labeled with a fluorescent marker to serve as a sample. When a sample is reacted with an element, single strands that are complementary to each other bind, or hybridize, to one another. Thus, biochips permit quantitative or qualitative analyses of gene expressions in living tissue by taking advantage of hybridization.
A successful example in the art is the experiment conducted by University of Tokyo, Institute of Medical Science with regard to drug efficacy (T. Tsunoda et al.: Discrimination of Drug Sensitivity of Cancer Using cDNA Microarray and Multivariate Statistical Analysis: Genome informatics 1999 (December 1999) pp.227-228, Universal Academy Press Inc.). In this experiment, RNA extracted from normal cells and RNA extracted from cancer cells are each labeled with a fluorescent dye of different colors. The two types of RNA were mixed and allowed to hybridize to elements (i.e., genes) on a biochip. The intensities of fluorescent signals emitted from each of the two fluorescent dyes were measured.
FIG. 16 schematically shows the manner in which the state of each gene expression that has been obtained from the above-described experiment is displayed. In this manner of display, the data for fluorescent signals resulting from hybridization with genes immobilized on a biochip are plotted on a graph, with one axis representing the fluorescent signals for normal cells and the other representing the signals for cancer cells. One point in the graph corresponds to one gene. In analyzing data, among genes that emit fluorescent signals with higher intensities than a predetermined value, those that are specific to disease conditions are discriminated against the other genes on the basis of the ratio of the signal intensity for the normal cells to the signal intensity for the cancer cells. Specifically, genes corresponding to the points in the area A (i.e., genes that function in normal cells but not in cancer cells) and genes corresponding to the points in the area B (i.e., genes that function in cancer cells but not in normal cells) in FIG. 16 are particularly distinguished. In this manner of displaying data, genes that function specifically in a specific disease can be discriminated.
The data used in such data analysis must be sufficiently reliable in itself to ensure feasibility of the analysis. In other words, the results should be reproducible in experiments conducted under the same conditions. However, the actual manufacturing technologies of biochips, as well as the techniques required for conducting experiments using biochips, are yet to be fully developed, and the reproducibility of experiments is not fully ensured. Underlying causes for this include the difficulty in spotting exactly equal amounts of elements on a biochip and the susceptibility of the technology to changes in environmental factors such as temperatures and humidity. Furthermore, the techniques have not been fully established to ensure constant hybridization reaction rates and the accuracy of the readings of fluorescent light after hybridization. At present, there is a considerable uncertainty concerning the reliability of the data obtained from these experiments.
FIG. 17 schematically shows an image data obtained when the results of a biochip experiment are read by a scanner. Until now, researchers have needed to visually examine such read images of biochips to determine if the data are usable or not. For example, data for a biochip is determined to be unusable when the read image data is dark throughout it (i.e., no expression is observed.), or when the image is partially bright (i.e., incomplete expression). These conditions seem to occur such as when hybridization is incomplete or when the substrate of the biochip is scratched or when spotted amounts on the biochip are not uniform throughout the biochip, though the exact causes are not known.
At present, from manufacturers"" point of view, there is an increasing need for technologies to improve the accuracy of manufacturing processes of biochips and to enable mass production of reliable biochips with decreased errors. Thus, proper evaluation methods or tools are needed to accurately determine the accuracy and errors in the manufacturing of biochips. In contrast, from the researchers"" point of view who use the biochip in their experiments, it will be convenient if proper evaluation methods or tools are provided for evaluating the results of biochip experiments in order to allow the user to determine if the results are usable or not, and if not, allow the user to find out the exact cause of it. Thus, a need exists for evaluation methods that enable the user to know what faulty events have taken place at what point of the manufacturing process of biochips and/or experiments using biochips and take into account the results in the later manufacturing or experiments.
The present invention addresses such a need of both of biochip manufacturers and users. Accordingly, it is an object of the present invention to provide effective methods for detecting any faulty events in the manufacturing process of biochips or in experiments using biochips from the data obtained in the experiments using the biochips.
The present invention achieves the above object by displaying errors present in the data obtained from a biochip in a manner that is visually easy to interpret and quantifying such errors. Specifically, a plurality of sections is defined on a single biochip. The same type of control material is diluted to different concentrations and is spotted in a plurality of spots in each of the sections to serve as controls. A mixed sample is prepared by mixing two types of samples each labeled with a different fluorescent dye and is used in a hybridization reaction on the biochip. Upon completion of the hybridization reaction, the measurement data for two types of fluorescent signals emitted from the two types of the fluorescent dyes are plotted on a graph for each section. The graphs are displayed on a single screen in the same arrangement as that of the sections on the biochip for comparison. In order to give an idea of how the measured data for controls are dispersed, the experimental errors are quantified by examining the linearity of data points for each control or by examining a slope angle of a straight line fitted to data points, the data points in each case plotted on a graph with vertical and horizontal axes representing the intensities of fluorescent signals for respective fluorescent dyes.
In experiments using biochips, a discrepancy may arise between the observed intensities of fluorescent signals and the actual expression levels. The discrepancy may vary from one biochip to another, or from one section to another in a biochip, due to variations in the spotted amounts of materials on the biochip, variations in the amounts of elements such as DNA, RNA or cDNA contained in a spot, or variations in the hybridization reaction. In order to correct such discrepancies, controls are arranged on the biochip. A control may be a gene known as a housekeeping gene which is constantly expressed in various types of cells to provide the maintenance activities required by all cells. Other materials that can be used as a control include a gene that is incapable of being expressed, such as a gene exclusively expressed in plants and not in animals, or a fluorescent dye that do not have to do with genes. These materials are spotted on a biochip to serve as a standard for fluorescent signals. Controls are typically used as a standard for fluorescent signals to correct data while they are used to measure the extent of data dispersion in the present invention.
In the present invention, the measured data for controls are used to detect the experimental errors in biochip experiments. The data are plotted on a graph for each section, and the resulting graphs are simultaneously displayed on a single screen in the same arrangement as that of the sections on the biochip.
Two approaches are employed in the present invention in order to quantify the dispersion of the measured data for controls. One approach is based on the linearity of the measured data for controls. That is, a straight line that best fits to multiple plots, or data points, for controls with different concentrations, which are obtained through dilutions using different dilution factors, is determined on the assumption that the ratio of the signal intensities for one of the two types of fluorescent dyes to the signal intensities for the other fluorescent dye remains substantially constant irrespective of the concentrations of the controls. Then, the linearity is quantitatively evaluated by means of a standard known as the coefficient of determination to see if plots are close to the line. Quantification of errors is thus achieved by evaluating errors by determining the coefficient of determination for the fitted line. The other approach is based on slopes defined for each data points on a graph. That is, errors are quantified by determining slopes of the lines drawn from data points to the origin.
From these observations, it is possible to estimate at what stage in the process of biochip experiments faulty events have occurred while taking into account, for example, changes in the conditions in the manufacturing of the biochip or in experiments using the biochip. Possible causes of errors include variations in the amounts of spotted liquids due to environmental factors such as temperatures and humidity, non-uniformity of hybridization reactions, insufficient rinsing of biochips after hybridization, errors caused by improper scanning of a fluorescence detection device due to an inclined biochip substate during detection of fluorescent light from the spots, distorted biochip substrates, errors in scanning caused by dusts present in the ambient air or in solutions, fluorescence inherent to biochip substrates, noises caused by a photoelectron amplifier, and the like. By associating these potential causes with the values of the errors quantified in accordance with the present invention and by considering the results of the experiments which are conducted under the same conditions as the initial experiments, the estimation of causes of errors can be facilitated.
In one aspect, the present invention provides a method for displaying results of hybridization experiments using a biochip. The method includes the steps of providing a biochip having a spot region divided into a plurality of sections, wherein the same type of control material that has been diluted to different concentrations is spotted in multiple spots in each of the sections; performing a hybridization reaction using a mixed sample prepared by mixing two different types of samples, each of which has been labeled with each of two different fluorescent dyes so as to obtain, for each control, measurement data concerning the intensities of two different types of fluorescent signals emitted from the two fluorescent dyes; plotting the data on a graph for each section, wherein the vertical axis and horizontal axis each represent the signal intensities of each of the two types of fluorescent signals; and simultaneously displaying on a single screen all of the graphs, each representing the data for one of the sections, in such a manner that the graphs are arranged in the same arrangement as that of the sections on the biochip.
In another aspect, the present invention provides a further method for displaying results of hybridization experiments using a biochip. The method includes the steps of providing a biochip having a spot region divided into a plurality of sections, wherein the same type of control material that has been diluted to different concentrations is spotted in multiple spots in each of the sections; performing a hybridization reaction using a mixed sample prepared by mixing two different types of samples, each of which has been labeled with each of two different fluorescent dyes so as to obtain, for each control, measurement data concerning the intensities of two different types of fluorescent signals emitted from the two fluorescent dyes; plotting the data on a graph for each section, wherein the vertical axis and horizontal axis each represent the signal intensities of each of the two types of fluorescent signals; determining the coefficient of determination between each plot and a straight line fitted to the plots; and displaying the coefficient of determination for each section on a graph that corresponds to each section.
In a further aspect, the present invention provides a further method for displaying results of hybridization experiments using a biochip. The method includes the steps of providing a biochip having a spot region divided into a plurality of sections, wherein the same type of control material that has been diluted to different concentrations is spotted in multiple spots in each of the sections; performing a hybridization reaction using a mixed sample prepared by mixing two different types of samples, each of which has been labeled with each of two different fluorescent dyes so as to obtain, for each control, measurement data concerning the intensities of two different types of fluorescent signals emitted from the two fluorescent dyes; plotting the data on a graph for each section, wherein the vertical axis and horizontal axis each represent the signal intensities of each of the two types of fluorescent signals; determining maximum, minimum and average slope angles for a set of straight lines, each of which extends from each of the plots to the origin, the slope angle being defined between each of the straight lines and the horizontal axis; and displaying the maximum, minimum and average slope angles on a graph in such a manner that each set of angles corresponds to each section.
In a still further aspect, the present invention provides a method for evaluating errors in hybridization experiments using a biochip. The method includes the steps of providing a biochip having a spot region divided into a plurality of sections, wherein the same type of control material that has been diluted to different concentrations is spotted in multiple spots in each of the sections; performing a hybridization reaction using a mixed sample prepared by mixing two different types of samples, each of which has been labeled with each of two different fluorescent dyes so as to obtain, for each control, measurement data concerning the intensities of two different types of fluorescent signals emitted from the two fluorescent dyes; plotting the data on a graph for each section, wherein the vertical axis and horizontal axis each represent the signal intensities of each of the two types of fluorescent signals; determining the coefficient of determination between each plot and a straight line fitted to the plots; and evaluating experimental errors using the coefficient of determination.
In a still further aspect, the present invention provides a further method for evaluating errors in hybridization experiments using a biochip. The method includes the steps of providing a biochip having a spot region divided into a plurality of sections, wherein the same type of control material that has been diluted to different concentrations is spotted in multiple spots in each of the sections; performing a hybridization reaction using a mixed sample prepared by mixing two different types of samples, each of which has been labeled with each of two different fluorescent dyes so as to obtain, for each control, measurement data concerning the intensities of two different types of fluorescent signals emitted from the two fluorescent dyes; plotting the data on a graph for each section, wherein the vertical axis and horizontal axis each represent the signal intensities of each of the two types of fluorescent signals; determining slope angles for a set of straight lines, each of which extends from each of the plots to the origin, the slope angle being defined between each of the straight lines and the horizontal axis; and evaluating experimental errors using the slope angles.
Preferably, the slope angles are maximum, minimum and average slope angles of the slopes.