A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the xerographic reproduction by anyone of the patent document or the patent disclosure in exactly the form it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates to the field of computer systems. More specifically, the present invention relates to computer systems for analyzing biological sequences such as nucleic acid sequences.
Devices and computer systems for forming an using arrays of materials on a substrate are known. For example, PCT application WO92/10588, incorporated herein by reference for all purposes, describes techniques for sequencing or sequence checking nucleic acids and other materials. Arrays for performing these operations may be formed in arrays according to the methods of, for example, the pioneering techniques disclosed in U.S. Pat. No. 5,143,854 and U.S. patent application Ser. No. 08/249,188 (now U.S. Pat. No. 5,571,639), both incorporated herein by reference for all purposes.
According to one aspect of the techniques described therein, an array of nucleic acid probes is fabricated at known locations on a substrate or chip. A fluorescently labeled nucleic acid is then brought into contact with the chip and a scanner generates an image file (which is processed into a cell file) indicating the locations where the labeled nucleic acids bound to the chip. Based upon the cell file and identities of the probes at specific locations, it becomes possible to extract information such as the monomer sequence of DNA or RNA. Such systems have been used to form, for example, arrays of DNA that may be used to study and detect mutations relevant to cystic fibrosis, the P53 gene (relevant to certain cancers), HIV, and other genetic characteristics.
Innovative computer-aided techniques for base calling are disclosed in U.S. patent application Ser. No. 08/53 1,137 (now U.S. Pat. No. 5,974,164), Ser. No. 08/528,656 (now U.S. Pat. No. 5,733,729), and Ser. No. 08/618,834 which are all hereby incorporated by reference for all purposes. However, improved computer systems and methods are still needed to evaluate, analyze, and process the vast amount of information now used and made available by these pioneering technologies.
Additionally, there is a need for improved computer-aided techniques for monitoring gene expression. Many disease states are characterized by differences in the expression levels of various genes either through changes in the copy number of the genetic DNA or through changes in levels of transcription (e.g., through control of initiation, provision of RNA precursors, RNA processing, etc.) of particular genes. For example, losses and gains of genetic material play an important role in malignant transformation and progression. Furthermore, changes in the expression (transciption) levels of particular genes (e.g., oncogenes or tumor suppressors), serve as signposts for the presence and progression of various cancers.
Similarly, control of the cell cycle and cell development, as well as diseases, are characterized by the variations in the transcription levels of particular genes. Thus, for example, a viral infection is often characterized by the elevated expression of genes of the particular virus. For example, outbreaks of Herpes simplex, Epstein-Barr virus infections (e.g., infectious mononucleosis), cytomegalovirus, Varicella-zoster virus infections, parvovirus infections, human papillomavirus infections, etc. are all characterized by elevated expression of various genes present in the respective virus. Detection of elevated expression levels of characteristic viral genes provides an effective diagnostic of the disease state. In particular, viruses such as herpes simplex, enter quiescent states for periods of time only to erupt in brief periods of rapid replication. Detection of expression levels of characteristic viral genes allows detection of such active proliferative (and presumably infective) states.
The present invention provides innovative systems and methods for analyzing biological sequences such as nucleic acid sequences. The computer system may analyze hybridization intensities indicating hybridization affinity between nucleic acid probes and a sample nucleic acid sequence in order to call bases in the sample sequence. Multiple base calls may be combined to form a single base call. Additionally, the computer system may analyze hybridization intensities in order to monitor gene expression or the change in gene expression as compared to a baseline.
According to one aspect of the invention, a computer-implemented method of calling an unknown base in a sample nucleic acid sequence comprises the steps of: receiving hybridization intensities for a plurality of sets of nucleic acid probes, each hybridization intensity indicating a hybridization affinity between a nucleic acid probe and the sample nucleic acid sequence; computing a base call for the unknown base for each set of probes; and computing a single base call for the plurality of sets of probes according to the base call for the unknown base which occurs most often for the plurality of sets of probes. Typically, the single base call is displayed on a screen display and a user is afforded the opportunity to display or not display the base cases from which the single base call is derived.
According to another aspect of the invention, a method of dynamically changing parameters for a computer-implemented base calling procedure comprises the steps of: generating base calls for at least a portion of a sample nucleic acid sequence utilizing the base calling procedure, the base calling procedure including a parameter that is changeable by a user; displaying the base calls for the at least a portion of a sample nucleic acid sequence; displaying the parameter of the base calling procedure; receiving input from the user specifying a new value for the parameter of the base calling procedure; generating updated base calls for the at least a portion of a sample nucleic acid sequence utilizing the base calling procedure and the new value for the parameter; and displaying the updated base calls for the at least a portion of a sample nucleic acid sequence. Typically the user-changeable parameter is a constant, threshold, or range.
According to another aspect of the invention, a computer-implemented method of monitoring expression of a gene in a sample nucleic acid sequence comprises the steps of: inputting a plurality of hybridization intensities of pairs of perfect match and mismatch probes, the perfect match probes being perfectly complementary to the gene and the mismatch probes having at least one base mismatch with the gene, and the hybridization intensities indicating hybridization infinity between the perfect match and mismatch probes and the sample nucleic acid sequence; comparing the hybridization intensities of each pair of perfect match probes; and generating a gene expression call of the sample nucleic acid sequence. In preferred embodiments, the expression call is denoted as expressed, marginal, or absent.
According to another aspect of the invention, a computer-implemented method of monitoring change in expression of a gene in a sample nucleic acid sequence comprises the steps of: inputting a plurality of hybridization intensities of pairs of perfect match and mismatch probes, the perfect match probes being perfectly complementary to the gene and the mismatch probes having at least one base mismatch with the gene, and the hybridization intensities indicating hybridization infinity between the perfect match and mismatch probes and the sample nucleic acid sequence; comparing the hybridization intensities of each pair of perfect match probes in order to generate a gene expression level of the sample nucleic acid sequence; and determining a change in expression by comparing the gene expression level to a baseline gene expression level. The change in expression may be displayed as a graph on the display screen.
A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.