As many as 50,000-100,000 genes can be found in each human cell, but are selectively used in each cell. Of them, a significant number of genes are involved in basic functions and routine cellular metabolic processes required for the sustenance of the cell. Such genes are called housekeeping genes (hereinafter referred to as “HKG”). In various gene expression analysis methods utilizing the quantification of messenger RNA (hereinafter referred to as “mRNA”) to determine expression levels of specific or multiple genes with the aim of identifying the functions of specific genes, searching for genes directed to specific functions, profiling the gene expression of organisms under specific conditions, and describing other biological purposes, endogenous reference genes mean housekeeping genes useful in the normalization of the mRNA level for the relative quantification of genes of interest.
Endogenous reference genes are most widely used to normalize mRNA level for accurate comparison of gene expression between different samples (Vandesompele J et al., Genome Biol 3(7), p. RESEARCH0034, 2002). Endogenous reference genes are usually used in gene expression analysis techniques ranging from conventional reverse transcriptase polymerase chain reaction (hereinafter referred to as “RT-PCR”) to recently developed quantitative real time PCR (hereinafter referred to as “qRT-PCT”), serial analysis of gene expression (hereinafter referred to as “SAGE”) and microarray. Traditional reference genes such as glyceraldehyde-3-phosphate dehydrogenase (hereinafter referred to as “GAPDH”) and β-actin (hereinafter referred to as “ACTB”) have been used without proper validation, assuming that they are expressed at constant levels across different samples, irrespectively cell or tissue type and are not regulated by experimental treatment.
However, it is well known that the expression of traditional reference genes may vary among different tissues and cell types and can be regulated by experimental conditions, including sample treatment, developmental stage and pathological states (Bereta J and Bereta M, Biochem Biophys Res Commun 217(1)363-369, 1995; Tricarico C et al., Anal Biochem 309(2):293-300, 2002; Thellin O et al., J Biotechnol 75(2-3):291-295, 1999; Rubie C et al., Mol Cell Probes 19(2):101-109, 2005; Schmittgen T D and Zakrajsek B A, J Biochem Biophys Methods 46(1-2):69-81, 2000; Zhong H and JSimons W, Biochem Biophys Res Commun 259(3):523-526, 1999; Selvey S et al., Mol Cell Probes 15(5):307-311, 2001; Wu Y Y and LRees J, Acta Derm Venereol 80(1):2-3, 2000; Lee P D et al., Genome Res 12(2):292-297, 2002; Hamalainen H K et al., Anal Biochem 299(1):63-70, 2001). The use of inappropriate reference genes in the relative quantification of gene expression may result in biased expression profiles. This concern has already been raised by many researchers (Tricarico C et al., Anal Biochem 309(2):293-300 2002; Dheda K et al., Anal Biochem 344(1):141-143, 2005; de Kok J B et al., Lab Invest 85(1):154-159, 2005; Brunner A M et al., BMC Plant Biol 4:14, 2004). Particularly, the selection of proper endogenous reference genes is essential for accurate measurement in qRT-PCR, which is a reliable method for detecting gene expression with high sensitivity and accuracy though accurate normalization, and may not be required in qualitative analysis such as northern blot or conventional RT-PCR (Huggett J et al., Genes Immun 6(4):279-284, 2005).
With the acknowledgement of the importance of the proper validation of traditional reference genes and the identification of more suitable reference genes, a number of studies have been undertaken to select the most suitable genes among commonly used reference genes in specific experimental conditions, or to identify novel genes, which are superior to the traditional genes that are universally used for mRNA quantification. However, most of the previous studies have been focused on the selection (validation) of the most stable genes among commonly used reference genes in specific experimental systems or a given set of limited tissue samples (Goidin D et al., Anal Biochem 295(1):17-21, 2001; Haller F et al., Anal Biochem 335(1):1-9, 2004; Ohl F et al., J Mol Med 83(12):1014-1024, 2005; Radonic A et al., Biochem Biophys Res Commun 313(4):856-862, 2004). Some programs are now available for identifying the most appropriate genes among multiple reference genes using qRT-PCR results (Vandesompele J et al., Genome Biol 3(7), p. RESEARCH0034, 2002; Pfaffl M W et al., Biotechnol Lett 26(6):509-15, 2004; Andersen C L et al., Cancer Res 64(15):5245-5250, 2004).
In addition, novel endogenous reference genes have been found mostly on the basis of microarray data (Hamalainen H K et al., Anal Biochem 299(1):63-70, 2001; Hoerndli F J et al., Anal Biochem 335(1):30-41, 2004; Czechowski T et al., Plant Physiol 139(1):5-17, 2005; Jin P et al., BMC Genomics, 5(1):55, 2004; Kobayashi M S et al., J. Neurosci Res 76(4):512-518, 2004; Shulzhenko N et al., Biochem Biophys Res Commun 337(1):306-12, 2005). As is well-known, the microarray technique has some problems and limitations (errors) due to the potential for inaccurate cross hybridization between probes and unintended transcripts, the potential for differences in hybridization efficiency between probe sets, and the potential for the incorrect annotation of transcripts (Haverty P M et al., Bioinformatics 20(18):3431-3441, 2004; van Ruissen F et al., BMC Genomics 6:91, 2005). The microarray technique also allows the detection of expression of genes only on the chip, in contrast to expressed sequence tag (hereinafter referred to as “EST”) and SAGE, in which the expression profiles of whole transcripts in samples (cDNA libraries) can be measured (van Ruissen F et al., BMC Genomics 6:91, 2005). The use of gene expression data from different platforms together is expected to complement the limitation of individual platforms. For example, SAGE is far more sensitive than EST for detecting low-abundance transcripts (Sun M et al., BMC Genomics 5(1):1-4, 2004).
Even if an ideal endogenous reference gene does not exist, it is possible to find a more ideal endogenous reference gene applicable to most experimental conditions than traditional reference genes through various, large gene expression data.
Leading to the present invention, intensive and thorough research on accurate comparison of gene expression among different samples, conducted by the present inventors, resulted in the finding that gene expression datasets constructed from microarray data, in addition to EST and SAGE data, are useful in searching for endogenous reference genes, and that novel reference genes identified using the datasets are superior to previously used genes and show more stable expression across a wide range of samples, thus being universally useful for the normalization of gene expression, rather than being limited for use on specific tissue samples or in specific studies.