A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the xeroxographic reproduction by anyone of the patent document or the patent disclosure in exactly the form it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Microfiche Appendices A to E comprising five (5) sheets, totaling 272 frames are included herewith.
The present invention relates to the field of computer systems. More specifically, the present invention relates to computer systems for visualizing biological sequences, as well as for evaluating and comparing biological sequences.
Devices and computer systems for forming and using arrays of materials on a substrate are known. For example, PCT application WO92/10588, incorporated herein by reference for all purposes, describes techniques for sequencing or sequence checking nucleic acids and other materials. Arrays for performing these operations may be formed in arrays according to the methods of, for example, the pioneering techniques disclosed in U.S. Pat. No. 5,143,854 and U.S. patent application Ser. No. 08/249,188, both incorporated herein by reference for all purposes.
According to one aspect of the techniques described therein, an array of nucleic acid probes is fabricated at known locations on a chip or substrate. A fluorescently labeled nucleic acid is then brought into contact with the chip and a scanner generates an image file indicating the locations where the labeled nucleic acids bound to the chip. Based upon the identities of the probes at these locations, it becomes possible to extract information such as the monomer sequence of DNA or RNA. Such systems have been used to form, for example, arrays of DNA that may be used to study and detect mutations relevant to cystic fibrosis, the P53 gene (relevant to certain cancers), HIV, and other genetic characteristics.
Improved computer systems and methods are needed to evaluate, analyze, and process the vast amount of information now used and made available by these pioneering technologies.
An improved computer-aided system for visualizing and determining the sequence of nucleic acids is disclosed. The computer system provides, among other things, improved methods of analyzing fluorescent image files of a chip containing hybridized nucleic acid probes in order to call bases in sample nucleic acid sequences.
According to one aspect of the invention, a computer system is used to identify an unknown base in a sample nucleic acid sequence by the steps of:
inputting multiple probe intensities, each of the probe intensities being associated with a probe;
the computer system comparing the multiple probe intensities where each of the probe intensities is substantially proportional to a probe hybridizing with at least one sequence; and
calling the unknown base according to the comparison of the multiple probe intensities.
According to one specific aspect of the invention, a higher probe intensity is compared to a lower probe intensity to call the unknown base. According to another specific aspect of the invention, probe intensities of a sample sequence are compared to probe intensities of a reference sequence. According to yet another specific aspect of the invention, probe intensities of a sample sequence are compared to statistics about probe intensities of a reference sequence from multiple experiments.
According to another aspect of the invention, a method is disclosed of processing reference and sample nucleic acid sequences to reduce the variations between the experiments by the steps of:
providing a plurality of nucleic acid probes;
labeling the reference nucleic acid sequence with a first marker;
labeling the sample nucleic acid sequence with a second marker; and
hybridizing the labeled reference and sample nucleic acid sequences at the same time.
According to yet another aspect of the invention, a computer system is used for comparative analysis and visualization of multiple sequences by the steps of:
displaying at least one reference sequence in a first area on a display device; and
displaying at least one sample sequence in a second area on said display device;
whereby a user is capable of visually comparing the multiple sequences.