The present invention relates generally to relational databases for storing and retrieving biological information. More particularly the invention relates to systems and methods for providing sequences of biological molecules and associated reagents in a relational format allowing retrieval in a client-server environment.
Informatics is the study and application of computer and statistical techniques to the management of information. In genome projects, bioinformatics includes the development of methods to search databases quickly, to analyze nucleic acid sequence information, and to predict protein sequence and structure from DNA sequence data.
Increasingly, molecular biology is shifting from the laboratory bench to the computer desktop. Today""s researchers require advanced quantitative analyses, database comparisons, and computational algorithms to explore the relationships between sequence and phenotype. Thus, by all accounts, researchers can not and will not be able to avoid using computer resources to explore gene expression, gene sequencing, and molecular structure.
One use of bioinformatics involves studying genes differentially or commonly expressed in different tissues or cell lines (e.g. normal and cancerous tissue). Such expression information is of significant interest in pharmaceutical research. The sequence tag method involves generation a large number (e.g., thousands) of Expressed Sequence Tags (xe2x80x9cESTsxe2x80x9d) from cDNA libraries (each produced from a different tissue or sample). ESTs are partial transcript sequences that may cover different parts of the mRNA(s) of a gene, depending on cloning and sequencing strategy. Each EST includes about 100 to 300 nucleotides. If it is assumed that the number of tags is proportional to the abundance of transcripts in the tissue or cell type used to make the cDNA library, then any variation in the relative frequency of those tags, stored in computer databases, can be used to detect the differential expression of the corresponding genes.
To make EST information manipulation easy to perform and understand, sophisticated computer database systems have been developed. In one database system, developed by Incyte Pharmaceuticals, Inc. of Palo Alto, Calif., abundance levels of MRNA species expressed in a given sample are electronically recorded and annotated with information available from public sequence databases such as GenBank. The resulting information is stored in a relational database that may be employed to evaluate changes in gene expression caused by disease progression, pharmacological treatment, aging, etc.
While relational database systems such as those developed by Incyte Pharmaceuticals, Inc. provide great power and flexibility in analyzing gene expression information, this area of technology is still in its infancy and further improvements in relational database systems will help accelerate biological research for numerous applications.
The present invention provides relational database systems for storing biomolecular sequence information together with biological annotations detailing the source of the sequence information, and associated reagent information. The acquisition, storage and access of reagent information associated with databased biomolecular sequence information is a particular advantage of the present invention. Such reagent information identifies genetic information and materials which may be made available to a user of the relational database system of the present invention for further application in research, therapeutic pharmaceutical development or other fields. The reagent information aspect of the present invention is preferably used in conjunction with a biomolecular sequence relational database system.
The present invention provides a computer system including a relational database having records containing information identifying initial sequences of polynucleotide inserts of a plurality of clones, optionally, additional sequences of the polynucleotide inserts of a subset of the plurality of clones, and reagent specifications of the subset of clones. The system also includes a user interface allowing a user to selectively view information regarding the sequences and reagent specifications.
The present invention also provides a method, implemented on a computer system, for accessing information relating to one or more reagent clones. The method involves providing a relational database having records containing information identifying initial sequences of polynucleotide inserts of a plurality of clones, optionally, additional sequences of the polynucleotide inserts of a subset of the plurality of clones, and reagent specifications of the subset of clones. The method also involves entering, in a graphical user interface, a query relating to one or more of the sequences or reagent specifications, determining matches between the query entry and the information, and displaying the results of the determination.
In addition, the present invention provides a computer program product, comprising a computer-usable medium having computer-readable program code embodied thereon relating to a relational database having records containing information identifying initial sequences of polynucleotide inserts of a plurality of clones, optionally, additional sequences of the polynucleotide inserts of a subset of the plurality of clones, and reagent specifications of the subset of clones. The computer program product may also include computer-readable program code for effecting the following steps within a computing system: providing an interface for receiving a query relating to one or more reagent specifications, determining matches between the query entry and the information, and displaying the results of the determination.
The present invention further provides a reagent clone identified by a process, at least partially implemented on a computer system, for establishing a set of reagent clones. The process involves grouping initial sequences of polynucleotide inserts in a plurality of clones into a master cluster, assembling the initial sequences of the master cluster into one or more contiguous sequences, such that relationships of sequences to each other in the master cluster are elucidated, and nominating at least one clone represented by a master cluster as a reagent clone, according to specified priority criteria. A set of reagent clones may also be nominated according to such a method. The set of reagent clones may have a variety of uses including as hybridizable elements on a biological microarray.
These and other features and advantages of the invention will be described in more detail below with reference to the drawings.