The invention generally relates to techniques for analyzing biological samples such as DNA, RNA, or protein samples and in particular to techniques for analyzing the output patterns of hybridized biochip microarrays.
A variety of techniques have been developed to analyze DNA or other biological samples to identify diseases, mutations, or other conditions present within a patient providing the sample. Such techniques may determine, for example, whether the patient has any particular disease such as cancer or AIDS, or has a predisposition toward the disease, or other medical conditions present in the patient. DNA-based analysis may be used either as an in-vitro or as an in-vivo control mechanism to monitor progression of disease, assess effectiveness of therapy or be used to design dosage formulations. DNA-based analysis is used verify the presence or absence of expressed genes and polymorphisms.
One particularly promising technique for analyzing biological samples uses a DNA-based microarray (or microelectronics biochip) which generates a hybridization pattern representative of the characteristics of the DNA within the sample. Briefly, a DNA microarray includes a rectangular array of immobilized single stranded DNA fragments. Each element within the array includes few tens to millions of copies of identical single stranded strips of DNA containing specific sequences of nucleotide bases. Identical or different fragments of DNA may be provided at each different element of the array. In other words, location (1,1) contains a different single stranded fragment of DNA than location (1,2) which also differs from location (1,3) etc. Certain biochip designs may replicate the nucleotide sequence in multiple cells.
DNA-based microarrays deploy chemiluminiscence, fluorescence or electrical phenomenology to achieve the analysis. In methods that exploit fluorescence imaging, a target DNA sample to be analyzed is first separated into individual single stranded sequences and fragmented. Each sequence being tagged with a fluorescent marker molecule. The fragments are applied to the microarray where each fragment binds only with complementary DNA fragments already embedded on the microarray. Fragments which do not match any of the elements of the microarray simply do not bind at any of the sites of the microarray and are discarded during subsequent fluidic reactions. Thus, only those microarray locations containing fragments that bind complementary sequences within the target DNA sample will receive the fluorescent molecules. Typically, a fluorescent light source is then applied to the microarray to generate a fluorescent image identifying which elements of the microarray bind to the patient DNA sample and which do not. The image is then analyzed to determine which specific DNA fragments were contained within the original sample and to determine therefrom whether particular diseases, mutations or other conditions are present in the patient sample.
For example, a particular element of the microarray may be exposed to fragments of DNA representative of a particular type of cancer. If that element of the array fluoresces under fluorescent illumination, then the DNA of the sample contains the DNA sequence representative of that particular type of cancer. Hence, a conclusion can be drawn that the patient providing the sample either already has that particular type of cancer or is perhaps predisposed towards that cancer. As can be appreciated, by providing a wide variety of known DNA fragments on the microarray, the resulting fluorescent image can be analyzed to identify a wide range of conditions.
Unfortunately, under conventional techniques, the step of analyzing the fluorescent pattern to determine the nature of any conditions characterized by the DNA is expensive, time consuming, and somewhat unreliable for all but a few particular conditions or diseases. One major problem with many conventional techniques is that the techniques have poor repeatability. Hence, if the same sample is analyzed twice using two different chips, different results are often obtained. Also, the results may vary from lab to lab. Consistent results are achieved only when the target sample has high concentrations of oligonucleotides of interest. Also, skilled technicians are required to prepare DNA samples, implement the hybridization protocol, and analyze the DNA microarray output possibly resulting in high costs. One reason that repeatability is poor is that the signatures within the digitized hybridization pattern (also known as a xe2x80x9cdot spectrogramxe2x80x9d) that are representative of mutations of interest are typically very weak and are immersed in considerable noise. Conventional techniques are not particularly effective in extracting mutation signatures from dot spectrograms in low signal to noise circumstances. Circumstances wherein the signal to noise ratio is 0 to strongly negative (xe2x88x922 to xe2x88x9230 dB) are particularly intractable.
Accordingly, it would highly desirable to provide an improved method and apparatus for analyzing the output of the DNA microarray to more expediently, reliably, and inexpensively determine the presence of any medical conditions or concerns within the patient providing the DNA sample. It is particularly desirable to provide a technique that can identify mutation signatures within dot spectrograms even in circumstance wherein the signal to noise ration is extremely low. It is to these ends that aspects of the invention are generally drawn.
Referring now to FIG. 1, conventional techniques for designing DNA microarray chips and for analyzing the output thereof will now be described in greater detail. Initially, at step 100, fluorescently labeled primers are prepared for flanking loci of genes of interest within the DNA sample. The primers are applied to the DNA sample such that the fluorescently labeled primers flank genes of interest. At step 102, the DNA sample is fragmented at the locations where the fluorescently labeled primers are attached to the genes of interest to thereby produce a set of DNA fragments, also called xe2x80x9coligonucleotidesxe2x80x9d for applying to the DNA microarray.
In general, there are two types of DNA microarrays: passive hybridization microarrays and active hybridization microarrays. Under passive hybridization, oligonucleotides characterizing the DNA sample are simply applied to the DNA microarray where they passively attach to complementary DNA fragments embedded on the array. With active hybridization, the DNA array is configured to externally enhance the interaction between the fragments of the DNA samples and the fragments embedded on the microarray using, for example, electronic techniques. Within FIG. 1, both passive hybridization and active hybridization steps are illustrated in parallel. It should be understood that, currently for any particular microarray, either the passive hybridization or the active hybridization steps, but not both, are typically employed. Referring first to passive hybridization, at step 104 a DNA microarray chip is prefabricated with oligonucleotides of interest embedded or otherwise attached to particular elements within the microarray. At step 106, the oligonucleotides of the DNA sample generated at step 102 are applied to the microarray. Oligonucleotides within the sample that match any of the oligonucleotides embedded on the microarray passively bind with the oligonucleotides of the array while retaining their fluorescently labeled primers such that only those locations in the microarray having corresponding oligonucleotides within the sample receive the primers. It should be noted that each individual nucleotide base within the oligonucleotide sequence (with lengths ranging from 5 to 25 base pairs) can bond with up to four different nucleotides within the microarray, but only one oligonucleotide represents an exact match. When illuminated with fluorescent light, the exact matches fluoresces most effectively and the non-exact matches fluoresce considerably less or not at all.
At step 108, the DNA microarray with the sample loaded thereon is placed within a fluidics station provided with chemicals to facilitate the hybridization reaction, i.e., the chemicals facilitate the bonding of the oligonucleotide sample with corresponding oligonucleotides within the microarray. At step 110, the microarray is illuminated under fluorescent light, perhaps generated using an ion-argon laser, and the resulting fluorescent pattern is digitized and recorded. Alternately, a photograph of the fluorescent pattern may be taken, developed, then scanned into a computer to provide a digital representation of the fluorescent pattern. In any case, at step 112, the digitized pattern is processed using dedicated software programs which operate to focus the digital pattern and to subsequently quantize the pattern to yield a fluorescent intensity value for each array within the microarray pattern. At step 114, the resulting focused array pattern is processed using additional software programs which compute an average intensity value at each array location and provides for necessary normalization, color compensation and scaling. Hence, following step 114, a digitized fluorescent pattern has been produced identifying locations within the microarray wherein oligonucleotides from the DNA sample have bonded. This fluorescent pattern is referred to herein as a xe2x80x9cdot spectrogramxe2x80x9d.
In existing biochips that actively initiate, facilitate or selectively block hybridization, a DNA microarray is prefabricated for active hybridization at step 116. At step 118, the DNA sample is applied to the active array and electronic signals are transmitted into the array to help facilitate bonding between the oligonucleotides of the sample and the oligonucleotides of the array. The microarray is then placed within a fluidics station which further facilitates the bonding. Thereafter, at step 122, an electronic or fluorescent readout is generated from the microarray. When electrical output signals from the biochip array are used to quantify and classify the post-hybridization output, the output signal is indicative of the number oligonucleotide fragments bonded to each site within the array. At step 124 the electronic output is processed to generate a dot spectrogram similar or identical to the dot spectrogram generated using the optical readout technique of steps 110-114. Hence, regardless of whether steps 104-114 are performed or steps 116-124 are performed the result is a dot spectrogram representative of oligonucleotides present within the DNA sample. Here it should be noted that some conventional passive hybridization DNA microarrays provide electronic output and some active hybridization microelectronic arrays provide optical readout. Thus, for at least some techniques, the output of step 108 is processed in accordance with steps 122 and 124. For other techniques, the output of step 120 is processed in accordance with steps 110-114. Again, the final results are substantially the same, i.e., a dot spectrogram.
At step 126, the dot spectrogram is analyzed using clustering software to generate a gene array amplitude readout pattern representative of mutations of interest within the target DNA sample. In essence, step 126 operates to correlate oligonucleotides represented by the dot spectrogram with corresponding DNA mutations. Next, at step 128, the resulting digital representation of the mutations of interest are processed using mapping software to determine whether the mutations are representative of particular diagnostic conditions, such as certain diseases or conditions. Hence, step 128 operates to perform a mutation-to-diagnostic analyses. Finally, at step 130 the diagnostic conditions detected using step 128 are evaluated to further determine whether or not the diagnostic, if any, can properly be based upon the DNA sample. Classical methods such as probabilistic estimator such as minimum a posteriori (MAP) estimator, maximum likelihood estimator (MLE) or inferencing mechanism may be used to render a diagnostic assessment.
As noted above, it would be desirable to provide improved techniques for analyzing the outputs for DNA microarrays to more quickly, reliably and inexpensively yield a valid diagnostic assessment. To this end, the invention is directed primarily to providing a sequence of steps for replacing steps 114-130 of FIG. 1.
In accordance with a first aspect of the invention, a method is provided for analyzing an output pattern of a biochip to identify mutations, if any, present in a biological sample applied to the biochip. In accordance with the method, a resonance pattern is generated which is representative of resonances between a stimulus pattern associated with a set of known mutations and the output pattern of the biochip. The resonance pattern is interpreted to yield a set of confirmed mutations by comparing resonances found therein with predetermined resonances expected for the selected set of mutations.
In an exemplary embodiment, the biological sample is a DNA sample and the output pattern being analyzed is a quantized dot spectrogram generated by a hybridized oligonucleotide microarray. The resonance pattern is generated by iteratively processing the dot spectrogram by performing a convergent reverberation to yield a resonance pattern representative of resonances between a predetermined set of selected Quantum Expressor Functions and the dot spectrogram until a predetermined degree of convergence is achieved between the resonances found in the resonance pattern and resonances expected for the set of mutations. The resonance pattern is analyzed to yield a set of confirmed mutations by mapping the confirmed mutations to known diseases or diagnostic conditions of interest, associated with the pre-selected set of known mutations to identify diseases, if any, indicated by the DNA sample. A diagnostic confirmation is then made by taking the identified diseases and solving in reverse for the associated Quantum Expressor Functions and then comparing those Quantum Expressor Functions with ones expected for the mutations associated with the identified disease to verify correspondence. If no correspondence is found, a new sub-set of known mutations are selected and the steps are repeated to determine whether any of the new set of mutations are present in the sample.
In the exemplary embodiment the set of nonlinear Quantum Expressor Functions are generated are follows. A set of mutation signatures representative of the pre-selected set of known mutations is input. A representation of a microarray oligonucleotide pattern layout for the microarray, from which the dot spectrogram is generated, is also input. Then a set of resonant interaction parameters are generated which are representative of mutation pattern interactions between elements of the microarray including interactions from a group including element-to-element interactions, element-to-ensemble interactions, ensemble-to-element interactions, and ensemble-to-ensemble interactions. Then the set of nonlinear Quantum Expressor Functions are generated from the set of resonant interaction patterns by matching selected harmonics of the power spectral density (PSD) amplitude of a coded mutation signature, corresponding to the pre-selected mutation set of interest, to that of a pre-determined quantum-mechanical Hamiltonian system so that stochastic and deterministic time scales match, and the time scales couple back to noise statistics and degree of asymmetry.
Also in the exemplary embodiment, the dot spectrogram is differentially enhanced prior to the generation of the resonance pattern by refocusing the dot spectrogram to yield a re-focused dot spectrogram; cross-correlating the re-focused dot spectrogram; applying a local maxima filter to the correlated re-focused dot spectrogram to yield a maximized dot spectrogram; re-scaling the maximized dot spectrogram to yield a uniformly re-scaled dot spectrogram; and then re-scaling the uniformly re-scaled dot spectrogram to amplifying local boundaries therein to yield a globally re-scaled dot spectrogram.
By exploiting a resonant interaction, mutation signatures may be identified within a dot spectrogram even in circumstances involving low signal to noise ratios or, in some cases, negative signal to noise ratios. By permitting the mutation signatures to be identified in such circumstances, the reliability of dot spectrogram analysis is thereby greatly enhanced. With an increase in reliability, costs associated with performing the analysis are decreased, in part, because there is less of a requirement for skilled technicians. Other advantages of the invention arise as well.
In accordance with a second aspect of the invention, a method of generating a set of nonlinear Quantum Expressor Functions is provided. The method includes the steps of inputting a set of mutation signatures representative of the pre-selected set of known mutations and inputting a representation of a biochip layout. The method also includes the steps of generating a set of resonant interaction parameters representative of mutation pattern interactions between elements of the microarray including interactions from a group including element-to-element interactions, element-to-ensemble interactions, ensemble-to-element interactions, and ensemble-to-ensemble interactions and generating the set of nonlinear Quantum Expressor Functions from the set of resonant interaction patterns.
Among other applications, principles of the invention are applicable to the analysis of various arrayed biomolecular, ionic, bioelectronic, biochemical, optoelectronic, radio frequency (RF) and electronic microdevices. Principles of the invention are particularly applicable to mutation expression analysis at ultra-low concentrations using ultra-high density passive and/or active hybridization DNA-based microarrays. Techniques implemented in accordance with the invention are generally independent of the physical method employed to accumulate initial amplitude information from the bio-chip array, such as fluorescence labeling, charge clustering, phase shift integration and tracer imaging. Also, principles of the invention are applicable to optical, optoelectronic, and electronic readout of hybridization amplitude patterns. Furthermore, principles of the invention are applicable to molecular expression analysis at all levels of abstraction: namely DNA expression analysis, RNA expression analysis, protein interactions and proteinxe2x80x94DNA interactions for medical diagnosis at the molecular level.
Apparatus embodiments are also provided.