With progress in sequencing many genomes, including among them the human genome, there is additional interest in understanding the significance of changes in gene expression. The ability to correlate changes in gene expression, for example, with specific treatments and phenotypes in clinical and non-clinical biological systems, allows scientists to understand the underlying cell biology and identify the roles of specific genes, receptors and signaling pathways. One objective, among many, of this research is to identify specific genes that may serve, for example, as biomarkers for disease progression or diagnostic criteria, as well as to identify gene expression products (e.g., proteins) that can be targeted as or by new therapeutic compounds in order to study, diagnose, prevent or cure disease.
There have been significant advancements in the human genome-sequencing project and in similar sequencing efforts that involve organisms of interest to basic and preclinical research, genetics, and agronomics. This progress has generated and continues to generate deoxyribonucleic acid (“DNA”) and ribonucleic acid (“RNA”) sequence databases that serve as informational resources and support advancement in the methods by which genomic and proteomic research is carried out.
However, current genomic tools and techniques continue to require significant known genomic sequence information for the organism or tissue under investigation, or require that the investigators derive libraries of clones from the particular organism or tissue. In order to build a DNA microarray that represents essentially all genes for a particular species under investigation (e.g., human), the investigating scientist must expend tremendous resources to identify all possible messenger RNAs (“mRNAs”) that may be present in the studied sample. For example, high-density DNA microarrays, using large numbers of known genes, are required to conduct mRNA expression profiling in such samples. By comparison, use of low-density DNA microarrays creates a higher probability of “missing” genes (by omission from the array) that may be relevant to a given experimental paradigm.
Alternative methods, such as differential display and serial analysis of gene expression, may permit detection of differences in mRNA species between or among RNA samples. However, these methods also require significant resources to identify expressed genes and related expression products, such as mRNAs. For example, in order to identify differences in specific genes using differential display, segregated bands must be removed (excised) from an electrophoresis gel, amplified using polymerase chain reaction (“PCR”) techniques, and then sequenced. Similarly, serial analysis of gene expression (“SAGE”) requires significant sequencing resources to identify any differences in known and unknown genes.
The present invention addresses limitations in the prior art by comprising compositions and systems that incorporate novel strategies whereby molecular or biochemical assay compositions and systems are linked to DNA or RNA sequence databases for optimal resource efficiency in assaying gene expression.