A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the xeroxographic reproduction by anyone of the patent document or the patent disclosure in exactly the form it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Microfiche Appendices A to E comprising five (5) sheets, totaling 272 frames are included herewith.
The present invention relates to the field of computer systems. More specifically, the present invention relates to computer systems for visualizing biological sequences, as well as for evaluating and comparing biological sequences.
Devices and computer systems for forming and using arrays of materials on a substrate are known. For example, PCT applications WO92/10588 and 95/11995, incorporated herein by reference for all purposes, describe techniques for sequencing or sequence checking nucleic acids and other materials. Arrays for performing these operations may be formed in arrays according to the methods of, for example, the pioneering techniques disclosed in U.S. Pat. Nos. 5,445,934 and 5384,261, and U.S. patent application Ser. No. 08/249,188, each incorporated herein by reference for all purposes.
According to one aspect of the techniques described therein, an array of nucleic acid probes is fabricated at known locations on a chip or substrate. A labeled nucleic acid is then brought into contact with the chip and a scanner generates an image file (also called a cell file) indicating the locations where the labeled nucleic acids bound to the chip. Based upon the image file and identities of the probes at specific locations, it becomes possible to extract information such as the monomer sequence of DNA or RNA. Such systems have been used to form, for example, arrays of DNA that may be used to study and detect mutations relevant to cystic fibrosis, the P53 gene (relevant to certain cancers), HIV, and other genetic characteristics.
Improved computer systems and methods are needed to evaluate, analyze, and process the vast amount of information now used and made available by these pioneering technologies.
An improved computer-aided system for visualizing and determining the sequence of nucleic acids is disclosed. The computer system provides, among other things, improved methods of analyzing fluorescent image files of a chip containing hybridized nucleic acid probes in order to call bases in sample nucleic acid sequences.
According to one aspect of the invention, a computer system is used to identify an unknown base in a sample nucleic acid sequence by the steps of:
inputting multiple probe intensities, each of the probe intensities being associated with a nucleic acid probe;
the computer system comparing the multiple probe intensities where each of the probe intensities is substantially proportional to a nucleic acid probe hybridizing with at least one nucleic acid sequence; and
calling the unknown base according to the results of the comparison of the multiple probe intensities.
According to one specific aspect of the invention, a higher probe intensity is compared to a lower probe intensity to call the unknown base. According to another specific aspect of the invention, probe intensities of a sample sequence are compared to probe intensities of a reference sequence. According to yet another specific aspect of the invention, probe intensities of a sample sequence are compared to statistics about probe intensities of a reference sequence from multiple experiments.
According to another aspect of the invention, a method is disclosed of processing reference and sample nucleic acid sequences to reduce the variations between the experiments by the steps of:
providing a plurality of nucleic acid probes;
labeling the reference nucleic acid sequence with a first marker;
labeling the sample nucleic acid sequence with a second marker; and
hybridizing the labeled reference and sample nucleic acid sequences at the same time.
According to another aspect of the invention, a computer system is used to identify mutations in a sample nucleic acid sequence by the steps of:
inputting a first set of probe intensities, each of the probe intensities in said first set being associated with a nucleic acid probe and substantially proportional to the associated nucleic acid probe hybridizing with a reference nucleic acid sequence;
inputting a second set of probe intensities, each of the probe intensities in said first set being associated with a nucleic acid probe and substantially proportional to the associated nucleic acid probe hybridizing with said sample sequence;
the computer system comparing probe intensities in the first set to probe intensities in the second set to select hybridization regions where the probe intensities in the first and second sets differ; and
identifying mutations according to characteristics of the selected regions.
According to yet another aspect of the invention, a computer system is used for comparative analysis and visualization of multiple sequences by the steps of:
displaying at least one reference sequence in a first area on a display device; and
displaying at least one sample sequence in a second area on said display device;
whereby a user is capable of visually comparing the multiple sequences.
A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.