1. Technical Field of the Invention
The present invention relates to corrective methods for processing results of transcriptome experiments obtained by differential analysis. It relates more particularly to the processing of such results, in the case of experiments conducted on DNA chips. The object of transcriptome experiments is to identify genes of interest or groups of genes of interest.
2. Description of Background and/or Related and/or Prior Art
Generally, the level of expression of these genes of interest or of these groups of genes of interest vary significantly, for example, in response to a signal. During the analysis of results of transcriptome experiments, for example by means of DNA chips, it is common practice to select the genes exhibiting the greatest modulation, i.e., the greatest variation in their level of expression. The level of this modulation, also called modulation coefficient, is defined as the ratio of the level of expression observed in one experiment, for example under a “treatment” condition, to that observed in another experiment, for example under a “reference” condition.
Examination of the results shows that, the more a high level of modulation is used to restrict the number of genes selected, the more this favors, in the selection made, the emergence of genes of which the level of expression is, under the reference condition, close to the limit of detection. Now, there is no biological argument to explain the reason for which the genes most weakly expressed under the reference condition would be the genes most strongly modulated during a treatment. This selection therefore introduces a bias and results in genes which exhibit a lower level of modulation being ignored simply because they are more highly expressed under the reference condition.
If the expression-level modulation coefficient is estimated on the basis of several observations of the gene on several chips corresponding to the same condition, i.e., on the basis of replicates of the reference condition or of the treatment condition, it is demonstrated that the modulation coefficient and the average level of expression of the genes change conversely [R. Mansourian et al., The global error assessment (GEA) model for the selection of differentially expressed genes in microarray data, Bioinformatics Advance Access, 2004]. In other words, the lower the level of expression of a gene in several replicates of a reference condition, the higher its coefficient of modulation in response to the treatment, calculated on the basis of several replicates. This phenomenon is explained in part by the presence of a measurement background noise, which proves to be all the more predominant in the calculation of the modulation coefficient when the genes are weakly expressed.
The differential analysis according to the “Global Error Assessment” (GEA) method disclosed in the document referenced above makes it possible to correct this bias. It consists in grouping the genes together according to a statistical criterion, called significance (or p-value), taking into account the variability in the modulation coefficient as a function of the level of expression under the reference condition for each gene. The variability in the modulation coefficient is, for a given gene, the standard deviation of the modulation coefficients with a mean modulation coefficient. This p-value reflects the significance of a modulation coefficient value. This makes it possible to obtain groups of genes corresponding to a given p-value and to equilibrate, in the list of selected genes, the proportion of genes weakly expressed under the reference condition.
However, the p-value has no biological meaning. As a result, biologists, who reason in the world of modulations, cannot use this value as a basis for identifying the differentiated genes. Consequently, they cannot use the “GEA” method for finding the differentiated genes.
In practice, after one or more differential analyses, biologists most commonly use classification and visualization techniques in order to identify genes exhibiting expression modulation profiles that are similar from several conditions. This involves, for example, the technique of hierarchical classification or classification by robust singular value decomposition disclosed in the document L. Liu et al., Robust singular value decomposition analysis of microarray data, PNAS, 2003.
However, in these techniques, owing to display-related limitations, or in order to concentrate on more complex analyses such as ontological analyses or analyses relating to metabolic pathways, biologists are prone to limit the size of the lists of selected genes. Thus, they rely on the expression modulation levels measured under each condition, and do not therefore take into account the associated significance. The information relating to this significance is thus lost during the visualization of the modulation levels after classification. In other words, biologists simply consider the ratio of the level of expression of genes from two conditions, classified in decreasing order according to their modulation coefficient. This is standard modulation.
Generally wishing to visualize the genes most highly modulated under the treatment condition, biologists then apply decreasing sorting according to the level of modulation and conserve only the first genes. In doing so, they do not take into account the significance and reintroduce the selection bias that had been removed by the calculation of the p-value.