The present invention is related to processing of data scanned from molecular arrays. Molecular array technologies have gained prominence in biological research and are likely to become important and widely used diagnostic tools in the healthcare industry. Currently, molecular-array techniques are most often used to determine the concentrations of particular nucleic-acid polymers in complex sample solutions. Molecular-array-based analytical techniques are not, however, restricted to analysis of nucleic acid solutions, but may be employed to analyze complex solutions of any type of molecule that can be optically or radiometrically scanned and that can bind with high specificity to complementary molecules synthesized within, or bound to, discrete features on the surface of a molecular array. Because molecular arrays are widely used for analysis of nucleic acid samples, the following background information on molecular arrays is introduced in the context of analysis of nucleic acid solutions following a brief background of nucleic acid chemistry.
Deoxyribonucleic acid (“DNA”) and ribonucleic acid (“RNA”) are linear polymers, each synthesized from four different types of subunit molecules. The subunit molecules for DNA include: (1) deoxy-adenosine, abbreviated “A,” a purine nucleoside; (2) deoxy-thymidine, abbreviated “T,” a pyrimidine nucleoside; (3) deoxy-cytosine, abbreviated “C,” a pyrimidine nucleoside; and (4) deoxy-guanosine, abbreviated “G,” a purine nucleoside. The subunit molecules for RNA include: (1) adenosine, abbreviated “A,” a purine nucleoside; (2) uracil, abbreviated “U,” a pyrimidine nucleoside; (3) cytosine, abbreviated “C,” a pyrimidine nucleoside; and (4) guanosine, abbreviated “G,” a purine nucleoside. FIG. 1 illustrates a short DNA polymer 100, called an oligomer, composed of the following subunits: (1) deoxy-adenosine 102; (2) deoxy-thymidine 104; (3) deoxy-cytosine 106; and (4) deoxy-guanosine 108. When phosphorylated, subunits of DNA and RNA molecules are called “nucleotides” and are linked together through phosphodiester bonds 110–115 to form DNA and RNA polymers. A linear DNA molecule, such as the oligomer shown in FIG. 1, has a 5′ end 118 and a 3′ end 120. A DNA polymer can be chemically characterized by writing, in sequence from the 5′ end to the 3′ end, the single letter abbreviations for the nucleotide subunits that together compose the DNA polymer. For example, the oligomer 100 shown in FIG. 1 can be chemically represented as “ATCG.” A DNA nucleotide comprises a purine or pyrimidine base (e.g. adenine 122 of the deoxy-adenylate nucleotide 102), a deoxy-ribose sugar (e.g. deoxy-ribose 124 of the deoxy-adenylate nucleotide 102), and a phosphate group (e.g. phosphate 126) that links one nucleotide to another nucleotide in the DNA polymer. In RNA polymers, the nucleotides contain ribose sugars rather than deoxy-ribose sugars. In ribose, a hydroxyl group takes the place of the 2′ hydrogen 128 in a DNA nucleotide. RNA polymers contain uridine nucleosides rather than the deoxy-thymidine nucleosides contained in DNA. The pyrimidine base uracil lacks a methyl group (130 in FIG. 1) contained in the pyrimidine base thymine of deoxy-thymidine.
The DNA polymers that contain the organization information for living organisms occur in the nuclei of cells in pairs, forming double-stranded DNA helixes. One polymer of the pair is laid out in a 5′ to 3′ direction, and the other polymer of the pair is laid out in a 3′ to 5′ direction. The two DNA polymers in a double-stranded DNA helix are therefore described as being anti-parallel. The two DNA polymers, or strands, within a double-stranded DNA helix are bound to each other through attractive forces including hydrophobic interactions between stacked purine and pyrimidine bases and hydrogen bonding between purine and pyrimidine bases, the attractive forces emphasized by conformational constraints of DNA polymers. Because of a number of chemical and topographic constraints, double-stranded DNA helices are most stable when deoxy-adenylate subunits of one strand hydrogen bond to deoxy-thymidylate subunits of the other strand, and deoxy-guanylate subunits of one strand hydrogen bond to corresponding deoxy-cytidilate subunits of the other strand.
FIGS. 2A–B illustrate the hydrogen bonding between the purine and pyrimidine bases of two anti-parallel DNA strands. FIG. 2A shows hydrogen bonding between adenine and thymine bases of corresponding adenosine and thymidine subunits, and FIG. 2B shows hydrogen bonding between guanine and cytosine bases of corresponding guanosine and cytosine subunits. Note that there are two hydrogen bonds 202 and 203 in the adenine/thymine base pair, and three hydrogen bonds 204–206 in the guanosine/cytosine base pair, as a result of which GC base pairs contribute greater thermodynamic stability to DNA duplexes than AT base pairs. AT and GC base pairs, illustrated in FIGS. 2A–B, are known as Watson-Crick (“WC”) base pairs.
Two DNA strands linked together by hydrogen bonds forms the familiar helix structure of a double-stranded DNA helix. FIG. 3 illustrates a short section of a DNA double helix 300 comprising a first strand 302 and a second, anti-parallel strand 304. The ribbon-like strands in FIG. 3 represent the deoxyribose and phosphate backbones of the two anti-parallel strands, with hydrogen-bonding purine and pyrimidine base pairs, such as base pair 306, interconnecting the two strands. Deoxy-guanylate subunits of one strand are generally paired with deoxy-cytidilate subunits from the other strand, and deoxy-thymidilate subunits in one strand are generally paired with deoxy-adenylate subunits from the other strand. However, non-WC base pairings may occur within double-stranded DNA. Generally, purine/pyrimidine non-WC base pairings contribute little to the thermodynamic stability of a DNA duplex, but generally do not destabilize a duplex otherwise stabilized by WC base pairs. However, purine/purine base pairs may destabilize DNA duplexes.
Double-stranded DNA may be denatured, or converted into single stranded DNA, by changing the ionic strength of the solution containing the double-stranded DNA or by raising the temperature of the solution. Single-stranded DNA polymers may be renatured, or converted back into DNA duplexes, by reversing the denaturing conditions, for example by lowering the temperature of the solution containing complementary single-stranded DNA polymers. During renaturing or hybridization, complementary bases of anti-parallel DNA strands form WC base pairs in a cooperative fashion, leading to regions of DNA duplex. Strictly A-T and G-C complementarity between anti-parallel polymers leads to the greatest thermodynamic stability, but partial complementarity including non-WC base pairing may also occur to produce relatively stable associations between partially-complementary polymers. In general, the longer the regions of consecutive WC base pairing between two nucleic acid polymers, the greater the stability of hybridization between the two polymers under renaturing conditions.
The ability to denature and renature double-stranded DNA has led to development of many extremely powerful and discriminating assay technologies for identifying the presence of DNA and RNA polymers having particular base sequences or containing particular base subsequences within complex mixtures of different nucleic acid polymers, other biopolymers, and inorganic and organic chemical compounds. These methodologies include molecular-array-based hybridization assays. FIGS. 4–7 illustrate the principle of molecular-array-based hybridization assays. A molecular array (402 in FIG. 4) comprises a substrate upon which a regular pattern of features are prepared by various different types of manufacturing processes. The molecular array 402 in FIG. 4, and in subsequent FIGS. 5–7, has a grid-like two-dimensional array of square features, such as feature 404 shown in the upper left-hand corner of the molecular array. Each feature of the molecular array contains a large number of identical oligonucleotides covalently bound to the surface of the feature. In general, chemically distinct oligonucleotides are bound to the different features of a molecular array, so that each feature corresponds to a particular nucleotide sequence. In FIGS. 4–6, the principle of molecular-array-based hybridization assays is illustrated with respect to the single feature 404 to which a number of identical oligonucleotides 405–409 are bound. In practice, each feature of the molecular array contains an enormous number of oligonucleotide molecules, but, for the sake of clarity, FIGS. 4–6 only show a small number.
Once a molecular array has been prepared, the molecular array may be exposed to a sample solution of DNA molecules that includes DNA molecules (410–413 in FIG. 4) labeled with fluorophores, chemoluminescent compounds, or radioactive atoms 415–418. A labeled DNA molecule that contains a nucleotide sequence complementary to the base sequence of an oligonucleotide bound to the molecular array may hybridize through base pairing interactions to the oligonucleotide. FIG. 5 shows a number of labeled DNA molecules 502–504 hybridized to oligonucleotides 505–507 bound to the surface of the molecular array 402. DNA molecules that do not contains nucleotide sequences complementary to any of the oligonucleotides bound to the molecular array do not hybridize stably to oligonucleotides bound to the molecular array and generally remain in solution, such as labeled DNA molecules 508 and 509. The sample solution is then rinsed from the surface of the molecular array, washing away any unbound labeled DNA molecules. Finally, as shown in FIG. 6, the bound labeled DNA molecules are detected via optical or radiometric scanning. Optical scanning involves exciting labels of bound labeled DNA molecules with electromagnetic radiation of appropriate frequency and detecting fluorescent emissions from the labels, or detecting light emitted from chemoluminescent labels. When radioisotope labels are employed, radiometric scanning can be used to detect radiation emitted from labeled DNA molecules hybridized to oligonucleotides bound to the surface of the molecular array. Additional types of signals are also possible, including electrical signals generated by electrical properties of bound target molecules, magnetic properties of bound target molecules, and other such physical properties of bound target molecules that that can produce a detectable signal. Optical, radiometric, or other types of scanning produce an analog or digital representation of the molecular array as shown in FIG. 7, with features to which labeled DNA molecules are hybridized similar to 706 optically or digitally differentiated from those features to which no labeled DNA molecules are bound. In other words, the analog or digital representation of a scanned molecular array displays positive signals for features to which labeled DNA molecules are hybridized and displays negative features to which no, or an undetectably small number of, labeled DNA molecules are bound. Features displaying positive signals in the analog or digital representation indicate the presence of DNA molecules with complementary nucleotide sequences in the original sample solution. Moreover, the signal intensity produced by a feature is generally related to the amount of labeled DNA bound to the feature, in turn related to the concentration, in the sample to which the molecular array was exposed, of labeled DNA complementary to the oligonucleotide within the feature.
Molecular-array-based hybridization techniques allow extremely complex solutions of DNA molecules to be analyzed in a single experiment. Molecular arrays may contain hundreds, thousands, or tens of thousands or different oligonucleotides, allowing for the detection of hundreds, thousands, or tens of thousands of different DNA polymers containing complementary nucleotide sub-sequences in the complex DNA solutions to which the molecular array is exposed. In order to perform different sets of hybridization analyses, molecular arrays containing different sets of bound oligonucleotides are manufactured by any of a number of complex manufacturing techniques. These techniques generally involve synthesizing the oligonucleotides within corresponding features of the molecular array through complex iterative synthetic steps.
As pointed out above, molecular-array-based assays can involve other types of biopolymers, synthetic polymers, and other types of chemical entities. For example, one might attach protein antibodies to features of the molecular array that would bind to soluble labeled antigens in a sample solution. Many other types of chemical assays may be facilitated by molecular array technologies. For example, polysaccharides, glycoproteins, synthetic copolymers, including block coploymers, biopolymer-like polymers with synthetic or derivitized monomers or monomer linkages, block copolymers, and many other types of chemical entities may serve as probe and target molecules for molecular-array-based analysis. A fundamental principle upon which molecular arrays are based is that of specific recognition, by probe molecules affixed to the molecular array, of target molecules, whether by sequence-mediated binding affinities, binding affinities based on conformational or topological properties of probe and target molecules, or binding affinities based on spatial distribution of electrical charge on the surfaces of target and probe molecules.
DNA, and other biological polymers, may be labeled with different chemical chromophores, radioactive nuclides, or other signal-generating entities, and may be optically scanned at different wavelengths of light, radiometrically scanned for different types of radioactive emission within different energy ranges, or scanned by other techniques appropriate to detect signals produced by other signal-generating entities. In the case of optical scanning, each different wavelength at which a molecular array is scanned produces a different signal. Thus, in optical scanning, it is common to describe the signal produced by scanning in terms of the color of the wavelength of light employed for the scan. For example, a red signal is produced by scanning a molecular array with light having a wavelength corresponding to that of visible red light.
Scanning of a feature by an optical scanning device or radiometric scanning device generally produces a scanned image comprising a rectilinear grid of pixels, with each pixel having a corresponding signal intensity. FIG. 8A shows a portion of a scanned image of a molecular array that includes a pixel-based image of a disk-shaped feature of a molecular array. In FIG. 8A, the feature corresponds to a disk-shaped region 802 of pixels having relatively high signal intensities. Surrounding the feature 802 is a ring-like region 804 of pixels with relatively low measured intensities. The portion of the scanned image shown in FIG. 8A is thus conceptually equivalent to a digital, black-and-white photograph of the feature taken with light within a narrow range of wavelengths. Generally, the location of the disk-shaped region 802 corresponding to a feature is determined by various scanned image-to-scanned-molecular-array alignment techniques and procedures.
It is desirable for the signal intensities, or counts, of pixels within the area of a pixel-based scanned image corresponding to a feature to be relatively uniform. Similarly, it is also desirable for the signal intensities within background regions surrounding features to be relatively uniform. Non-uniform signal intensity distributions generally indicate the occurrence of one or more error or noise conditions that may prevent meaningful data from being collected from the feature.
FIGS. 8B–D illustrate various non-uniform signal intensity distributions within a scanned image of a molecular array feature. In FIG. 8B, for example, relatively large signal intensities are seen in regions 806 and 808 at the upper right, and lower left, of the scanned image as well as within the disk-shaped area 810 corresponding to a feature. Such non-uniform distribution of signal intensities may indicate defects in the preparation of the molecular array, including defects in the synthesis of probe molecules bound to the molecular array, contamination of the surface of the molecular array with a chromophore that responds to impinging light in a similar fashion to the response by the chromophore with which target molecules are labeled, flaws in the scanning device, or other such defects. In FIG. 8C, the signal intensities within the feature 812 are relatively uniform, with the exception of a number of extremely high, outlying signal intensities in individual pixels, such as pixels 814, 816, and 818. Such outlying pixel intensities may represent scanner measurement errors or defects in digital processing and digital representation of the scanned data. In FIG. 8D, a relatively large area 820 within a feature 822 has produced no signal, and therefore represents a significant spatial non-uniformity of pixel intensities. A condition such as that shown in FIG. 8D may arise when probe molecules are not uniformly bound to the surface of the molecular array within a feature, because of overlying contamination that masks the signal, or for other reasons. In the situations illustrated in FIGS. 8B–D, the sum of the pixel intensities within the disk-shaped region of the optical image corresponding to a feature may produce a total signal intensity, or count, for the feature that does not reflect the theoretical count that would be produced by scanning the feature were the one or more error conditions or noise conditions not present. Such scanned features suffering from non-uniform pixel intensities need to be recognized during processing of data scanned from a molecular array and flagged as outlier features, to prevent reporting of flawed and erroneous experimental results.
Currently, outlier features, or feature backgrounds, are commonly identified by using negative control features manufactured into molecular arrays and by manual inspection of scanned images. However, control-feature-based outlier detection may be insensitive to various types of non-uniformities and significantly adds to the cost of molecular array manufacture and molecular array scanning and data processing. Manual outlier detection suffers from the inaccuracies and deficiencies well-known to occur in most human-dependent tasks, and is also quite slow and economically inefficient. Thus, designers, manufacturers, and users of molecular arrays have recognized the need for a more accurate, automated technique for recognizing outlier features and outlier feature backgrounds in scanned images of molecular arrays.