The present invention relates to computer systems and more particularly to systems and methods for analysis of hybridization of samples to oligonucleotide probes or other polymer probes.
Devices and computer systems for forming and using arrays of materials on a substrate are known. The VLSIPS(trademark) and GeneChip(trademark) technologies provide methods of making and using very large arrays of polymers, such as nucleic acids, on very small chips. See U.S. Pat. No. 5,143,854 and PCT Patent Publication Nos. WO 90/15070 and 92/10092, each of which is hereby incorporated by reference for all purposes. Nucleic acid probes on the chip are used to detect complementary nucleic acid sequences in a sample nucleic acid of interest (the xe2x80x9ctargetxe2x80x9d nucleic acid). It is also possible to employ other types of probes or probes that are not included in arrays or chips.
Such probes are used for, e.g, base calling, detection of mutations, and analysis of gene expression. For all of these objectives, a typical technique is to expose the probes to target nucleic acid samples that have been marked with fluorescent or otherwise radioactive labels. For each probe or group of probes, a hybridization intensity is determined based on observed fluorescence or radioactivity. The hybridization intensity may also be measured in some other way.
These hybridization intensities are the basis for further analysis including base calling, mutation detection, and evaluation of expression of genes or expressed sequence tags. See European Patent Office Publication No. 0717113A and European Patent Office Publication No. 0848067, the contents of both publications being incorporated herein by reference.
Expression evaluation makes use of hybridization intensities determined from pairs of probes where each pair includes a perfect match probe and a mismatch probe. The term xe2x80x9cperfect match probexe2x80x9d refers to a probe that has a sequence that is perfectly complementary to a particular subsequence of a sequence of interest in a target nucleic acid. The term xe2x80x9cmismatch controlxe2x80x9d or xe2x80x9cmismatch probexe2x80x9d refer to probes whose sequence is deliberately selected not to be perfectly complementary to a particular target sequence.
For example, to determine the concentration of a particular mRNA sequence indicative of expression of a gene or EST of interest, a series of pairs of perfect match and mismatch probes may be provided. Each pair may include a perfect match probe perfectly complementary to a subsequence of interest. The mismatch probe may differ in one position from the perfect match probe. Each probe may include a series of e.g., 25 bases. The mRNA sequence may be interrogated by a series of probe pairs having successive alignments to the mRNA sequence.
After hybridization intensities are obtained, the number of instances of when the perfect match intensity is greater than the mismatch intensity is obtained, along with the average of the logarithm of the perfect match to mismatch ratios for all the probe pairs. To determine the quantitative abundance of mRNA, the average of the difference between perfect match and mismatch hybridization intensity is also computed.
Further opportunities exist, however, to improve the accuracy of assessments of expression levels. High frequency noise can result from variations in probe alignment to mRNA sequences, causing hybridization intensity to exhibit spurious peaks rather than smooth variation. This high frequency noise is especially prevalent in array designs where there are relatively small number of probes per gene and therefore less opportunity to average out the high frequency noise over results from large number of probes.
What is needed are systems and methods for reducing the deleterious affects of the high frequency noise found in the hybridization intensity measurements.
Systems and methods for enhanced quantitative analysis of hybridization intensity measurements obtained from oligonucleotide probes and other probes exposed to target samples are provided by virtue of the present invention. One embodiment ameliorates the effects of high frequency noise superimposed on a hybridization intensity signal measured over successive probe alignments to a target sample sequence. Detection of expressed genes and ESTs and quantitative measurement of expression level may be improved. Mutation detection and base calling may be improved.
A nonlinear lowpass filter may be used to remove the effects of spurious peaks in this signal. Also, a hybridization spectrum including the hybridization intensities measured over a series of probes may be compared to a reference hybridization spectrum to obtain a measure of similarity. The measure of similarity may indicate expression or non-expression of a particular gene or EST, or a point mutation.
In accordance with a first aspect of the present invention, a method for analyzing a nucleic acid sequence includes: inputting a plurality of hybridization intensities of probes exposed to the sample nucleic acid sequence, and applying a non-linear filter to the plurality of hybridization intensities.
In accordance with a second aspect of the present invention, a method for analyzing a sample nucleic acid sequence includes: inputting a plurality of hybridization intensities of probes exposed to the sample nucleic acid sequence, the plurality of hybridization intensities forming a hybridization spectrum of the sample nucleic acid sequence, and comparing the hybridization spectrum of the sample nucleic acid sequence to a reference hybridization spectrum to obtain an indication of similarity.
A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.