The present invention relates to the collection and storage of information pertaining to chips for processing samples.
Devices and computer systems for forming and using arrays of materials on a substrate are known. For example, PCT application WO92/10588, incorporated herein by reference for all purposes, describes techniques for sequencing or sequence checking nucleic acids and other materials. Arrays for performing these operations may be formed in arrays according to the methods of, for example, the pioneering techniques disclosed in U.S. Pat. No. 5,143,854 and U.S. Pat. No. 5,571,639, both incorporated herein by reference for all purposes.
According to one aspect of the techniques described therein, an array of nucleic acid probes is fabricated at known locations on a chip or substrate. A fluorescently labeled nucleic acid is then brought into contact with the chip and a scanner generates an image file indicating the locations where the labeled nucleic acids bound to the chip. Based upon the identities of the probes at these locations, it becomes possible to extract information such as the monomer sequence of DNA or RNA. Such systems have been used to form, for example, arrays of DNA that may be used to study and detect mutations relevant to cystic fibrosis, the P53 gene (relevant to certain cancers), HIV, and other genetic characteristics.
Computer-aided techniques for monitoring gene expression using such arrays of probes have also been developed as disclosed in U.S. patent application Ser. No. 08/828,952 and PCT publication No. WO 97/10365, the contents of which are herein incorporated by reference. Many disease states are characterized by differences in the expression levels of various genes either through changes in the copy number of the genetic DNA or through changes in levels of transcription (e.g., through control of initiation, provision of RNA precursors, RNA processing, etc.) of particular genes. For example, losses and gains of genetic material play an important role in malignant transformation and progression. Furthermore, changes in the expression (transcription) levels of particular genes (e.g., oncogenes or tumor suppressors), serve as signposts for the presence and progression of various cancers.
As can be seen, the probe array chips are designed to answer questions about genomic items, herein defined to include genes, expressed sequence tags (EST""s), gene clusters, and EST clusters. Associated with information about genomic items is genetic sequence information concerning the base sequences of genomic items. Probes are designed and selected for inclusion on a chip based on: 1) the identity of the genomic items to be investigated by the chip, 2) the sequence information associated with those genomic information, and 3) the type of information sought, e.g., expression analysis, polymorphism analysis, etc. The interrelationships, however, among probes, genomic items, and sequence information are, however, extremely complex, greatly complicating the tasks of designing chips, effectively exploiting chips that have already been designed, and efficiently interpreting the information generated by application of the chips.
Moreover, it is contemplated that the operations of chip design, construction, and application will occur on a very large scale. The quantity of information related to chip design to store and correlate is vast. What is needed is a system and method suitable for storing and organizing large quantities of information used in conjunction with the design of probe array chips.
The present invention provides systems and method for organizing information relating to the design of polymer probe array chips including oligonucleotide array chips. A database model is provided which organizes information interrelating probes on a chip, genomic items investigated by the chip, and sequence information relating to the design of the chip. The model is readily translatable into database languages such as SQL. The database model scales to permit storage of information about large numbers of chips having complex designs.
According to one aspect of the present invention, a computer-readable storage medium is provided. A relational database is stored on this medium. The relational database includes: a probe table including a plurality of probe records, each of the probe records specifying a polymer probe for use in one or more polymer probe arrays, a sequence item table including a plurality of sequence item records, each of the sequence item records specifying a nucleotide sequence to be investigated in the one or more polymer probe arrays, wherein there is a many-to-many relationship between the probe records and the sequence item records.
A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.