The present invention relates generally to systems and databases for obtaining, storing and retrieving biomolecular information. More particularly, the invention relates to a system and method for generating, storing and providing information relating to biomolecular data in a relational database.
Gene expression data analysis serves to identify genes which may be employed as markers for a particular disease or may be selected as gene targets for the development of new pharmaceutical compounds. Additionally, gene expression analysis can provide insight into the interactions between a large number of genes, including whether two or more genes belong to a common regulatory pathway.
Microarray-based experiments are presently a preferred method to generate gene expression data. Microarrays consist of an ordered arrangement of known gene sequences, or array elements, immobilized on a substrate. To generate gene expression data, the array elements are probed with a sample. The sample may have been derived, for example, from tissue of an individual suffering from a disease, from tissue treated in a specified manner or a control tissue. Samples are typically prepared by isolating mRNA, or its equivalent, and then labeling the mRNA with a fluorescent reporter group. The labeled mRNA sample is then combined with microarray array elements to form hybridization complexes between array elements and mRNA molecules that have identical or similar sequences (complementary sequences). Those labeled mRNA molecules that do not have a sequence complementary to the array element sequences are removed by a series of washes. Any formed complexes are detected by using a scanner to measure fluorescent signals emitted from specific locations on the microarray. Since the position and sequence of each array element is known, microarrays are an effective way to determine which specific genes are expressed in a sample.
The microarray hybridization experiments may be performed using one of several formats. In one format, a microarray is probed using a single labeled mRNA sample and what is detected after complex formation is an absolute measurement of levels of particular mRNAs in a sample. In a second format, a microarray is probed using two mRNA samples, each labeled with a different fluorescent reporter group, at the same time. In this case, the mRNAs from the two samples compete for hybridization to individual array elements and a ratio which reflects the relative abundances of a gene in the different samples is obtained. Typically, the competitive hybridization format is more reliable than the absolute hybridization format where comparisons of gene transcript levels has to be performed across more than one microarray.
Microarray-based experiments are generating increasing volumes of gene expression information which needs to be generated, stored and provided in an effective manner. The present invention provides the necessary software tools for the generation, storage and retrieval of such information. The software tools can be used to analyze data in both absolute and competitive hybridization formats.
In one embodiment, a biomolecular expression information processing system has procedures and tables that store hybridization data and abundance datasets. The hybridization data comprises information describing a sample and a microarray to which the sample is applied. The hybridization data also comprises information on expression data or levels from which the abundance dataset is generated. The tables also store information identifying a microarray technology type for each hybridization and microarray design information for each microarray technology type. The microarray design information includes technology data that specifies global characteristics of each microarray, and array element data that specifies characteristics, such as location and sequence information, of array elements in each microarray instance of the microarray technology type. The procedures process the abundance datasets in accordance with the microarray design information associated with each such abundance dataset. The system stores technology data for multiple distinct microarray technology types and stores array element data for multiple microarray designs of a single technology type.
When the biomolecules are genetic sequences, hybridizations are used to determine expression data or levels. When the biomolecules are polypeptide sequences, antibodies are used to determine expression data or levels.
In another embodiment, the biomolecular expression information processing system stores expression data for polypeptide sequences that was generated by the microarrays using antibodies.