The present invention relates to the analysis of molecular arrays, or biochips, and, in particular, to a method and system for processing a scanned image of a molecular array in order to index the regions of the image that correspond to features of the molecular array and to extract data from indexed positions within the scanned image that correspond to optical or radiometric signals emanating from features of the molecular array.
Molecular arrays are widely used and increasingly important tools for rapid hybridization analysis of sample solutions against hundreds or thousands of precisely ordered and positioned features containing different types of molecules within the molecular arrays. Molecular arrays are normally prepared by synthesizing or attaching a large number of molecular species to a chemically prepared substrate such as silicone, glass, or plastic. Each feature, or element, within the molecular array is defined to be a small, regularly-shaped region on the surface of the substrate. The features are arranged in a regular pattern. Each feature within the molecular array may contain a different molecular species, and the molecular species within a given feature may differ from the molecular species within the remaining features of the molecular array. In one type of hybridization experiment, a sample solution containing radioactively, fluorescently, or chemoluminescently labeled molecules is applied to the surface of the molecular array. Certain of the labeled molecules in the sample solution may specifically bind to, or hybridize with, one or more of the different molecular species that together comprise the molecular array. Following hybridization, the sample solution is removed by washing the surface of the molecular array with a buffer solution, and the molecular array is then analyzed by radiometric or optical methods to determine to which specific features of the molecular array the labeled molecules are bound. Thus, in a single experiment, a solution of labeled molecules can be screened for binding to hundreds or thousands of different molecular species that together comprise the molecular array. Molecular arrays commonly contain oligonucleotides or complementary deoxyribonucleic acid (xe2x80x9ccDNAxe2x80x9d) molecules to which labeled deoxyribonucleic acid (xe2x80x9cDNAxe2x80x9d) and ribonucleic acid (xe2x80x9cRNAxe2x80x9d) molecules bind via sequence-specific hybridization.
Generally, radiometric or optical analysis of the molecular array produces a scanned image consisting of a two-dimensional matrix, or grid, of pixels, each pixel having one or more intensity values corresponding to one or more signals. Scanned images are commonly produced electronically by optical or radiometric scanners and the resulting two-dimensional matrix of pixels is stored in computer memory or on a non-volatile storage device. Alternatively, analog methods of analysis, such as photography, can be used to produce continuous images of a molecular array that can be then digitized by a scanning device and stored in computer memory or in a computer storage device.
FIG. 1 shows a generalized representation of a molecular array. Disk-shaped features of the molecular array, such as feature 101, are arranged on the surface of the molecular array in rows and columns that together comprise a two-dimensional matrix, or grid. Features in alternative types of molecular arrays may be arranged to cover the surface of the molecular array at higher densities, as, for example, by offsetting the features in adjacent rows to produce a more closely packed arrangement of features. Radiometric or optical analysis of a molecular array, following a hybridization experiment, results in a two-dimensional matrix, or grid, of pixels. FIG. 2 illustrates the two-dimension grid of pixels in a square area of a scanned image encompassing feature 101 of FIG. 1. In FIG. 2, pixels have intensity values ranging from 0 to 9. Intensity values of all non-zero pixels are shown in FIG. 2 as single digits within the pixel. The non-zero pixels of this scanned image representing feature 101 of FIG. 1 inhabit a roughly disk-shaped region corresponding to the shape of the feature. The pixels in a region surrounding a feature generally have low or 0 intensity values due to an absence of bound signal-producing radioactive, fluorescent, or chemoluminescent label molecules. However, background signals, such as the background signal represented by non-zero pixel 202, may arise from non-specific binding of labeled molecules due to imprecision in preparation of molecular arrays and/or imprecision in the hybridization and washing of molecular arrays, and may also arise from imprecision in optical or radiometric scanning and various other sources of error that may depend on the type of analysis used to produce the scanned image. Additional background signal may be attributed to contaminants in the surface of the molecular array or in the sample solutions to which the molecular array is exposed. In addition, pixels within the disk-shaped image of a feature, such as pixel 204, may have 0 values or may have intensity values outside the range of expected intensity values for a feature. Thus, scanned images of molecular array features may often show noise and variation and may depart significantly from the idealized scanned image shown in FIG. 1.
FIG. 3 illustrates indexing of a scanned image produced from a molecular array. A set of imaginary horizontal and vertical grid lines, such as horizontal grid line 301, are arranged so that the intersections of vertical and horizontal grid lines correspond with the centers of features. The imaginary grid lines establishes a two-dimensional index grid for indexing the features. Thus, for example, feature 302 can be specified by the indices (0,0). For alternative arrangements of features, such as the more closely packed arrangements mentioned above, a slightly more complicated indexing system may be used. For example, feature locations in odd-indexed rows having a particular column index may be understood to be physically offset horizontally from feature locations having the same column index in even-indexed rows. Such horizontal offsets occur, for example, in hexagonal, closest-packed arrays of features.
In order to interpret the scanned image resulting from optical or radiometric analysis of a molecular array, the scanned image needs to be processed to: (1) index the positions of features within the scanned image; (2) extract data from the features and determine the magnitudes of background signals; (3) compute, for each signal, background subtracted magnitudes for each feature; (4) normalize signals produced from different types of analysis, as, for example, dye normalization of optical scans conducted at different light wavelengths to normalize different response curves produced by chromophores at different wavelengths; and (5) determine the ratios of background-subtracted and normalized signals for each feature while also determining a statistical measure of the variability of the ratios or confidence intervals related to the distribution of the signal ratios about a mean signal ratio value. These various steps in the processing of scanned images produced as a result of optical or radiometric analysis of molecular arrays together comprise an overall process called feature extraction.
Designers, manufacturers, and users of molecular arrays have recognized a need for automated feature extraction. Automated feature extraction, like any other automated technique, can produce enormous savings in the time and cost of using molecular arrays for chemical and biological analysis. Automated feature extraction can also eliminate inconsistencies caused by user error and can greatly increase the reproducibility and objectivity of feature extraction.
In one aspect, a method is provided for evaluating an actual orientation of a molecular array having features which may be arranged in a pattern (for example, on a rectilinear grid). In this method, one or more images of the molecular array are obtained. Such images may be produced by scanning the molecular array to determine data signals emanating from discrete positions on a surface of the molecular array, or by other means. An actual result of a function is calculated on positions (for example, pixels) of an image, which positions lie in a second pattern (for example, along one or more paths such as along the expected positions of array features). This actual result is compared with an expected result which would be obtained if the second pattern had a predetermined orientation on the array (for example, superimposed over at least part of the array). An actual array orientation may be evaluated based upon the results of the comparison. For example, if there is more than a predetermined difference (a xe2x80x9ctolerancexe2x80x9d) between the actual and expected results, this could be taken as an indication that the array does not have the expected orientation in a scanner. Before trying to interpret the image data further, the orientation of the second pattern on the array can be altered and the comparison step repeated, and these comparison and second pattern re-orientation steps repeated in further iterations as often as necessary until the actual and expected results are within the predetermined difference (at which point the actual and expected orientations should be the same, within the predetermined tolerance). The greater the points on the second pattern, the greater the accuracy of the orientation information that can generally be obtained from the comparison.
In a particular implementation where the features of the molecular array are arranged on a rectilinear grid, the second pattern may be a rectilinear grid of rows and columns which would lie on the rows and columns of the rectilinear grid of the array when the second pattern and array are superimposed. In this case then, the calculation may be a function executed along the rows and columns of the second pattern (for example, to obtain row and column vectors). The actual result may be compared to the expected result quantitatively or only qualitatively. For example, in the case of row and column vectors the comparison may be a comparison of the actual vector shapes with the expected vector shapes if the second pattern had the predetermined orientation on the array (the predetermined orientation, for example, rows and columns of the second pattern being aligned over rows and columns of features).
Typically, the orientation referenced is a rotational orientation (that is a rotational orientation about an axis normal to the array) although the orientation could be one or more positions in space (for example, sideways or up and down displacement). In any aspect of the method, the calculating and comparing steps may optionally be repeated by changing the orientation of the pattern on the image until a match (either exactly or within a predetermined tolerance) between the actual and expected results is obtained. The difference between the orientation of the second pattern on the image and the predetermined orientation on the array when the match is obtained, may be used in the evaluation step as a measure of the orientation of the array. Alternatively (or even additionally), a given actual result may be compared with different expected results based on different predetermined orientations of the second pattern on the array, until the match is obtained. Information on the orientation of the array can then be used in the extraction of data from the array.
One embodiment of the present invention comprises a method and system for automated feature extraction from scanned images produced by optical, radiometric, or other types of analysis of molecular arrays. First, horizontal and vertical projections of pixel values, called row and column vectors, are computationally produced from the scanned image. The row and column vectors are analyzed to determine the positions of peaks, and the positions of the first and last peaks in the row and column vectors are used to estimate the positions of the corner features within the scanned image. Typically, bright control features, i.e. features designed to hybridize to labeled sample molecules of any sample solution to which a molecular array is exposed, are placed on the border of the molecular array to facilitate this process. When necessary, row and column vectors can be calculated over a range of rotations of a two-dimensional, orthogonal coordinate system in order to select the most favorable rotation angle at which to fix the coordinate system. Analysis of regions of the scanned image representing the corner features can be used to more exactly locate the positions of the corner features. Then, using the established positions of the corner features, an initial coordinate system is computationally established for the scanned image. Using the initial coordinate system, the centroids of features producing strong signals, or, in other words, pixels having high signal-to-noise ratios and located close to expected positions in the scanned image, are determined, and a regression analysis is used to refine the coordinate system to best correspond to the determined positions of the strong features. The refined coordinate system is employed to locate the positions of weak features and the positions of the background regions local to each feature. Next, a process is used to analyze various different signals generated by different analytical methods in order to select the most reliable portions of each feature and the local background surrounding the feature for subsequent signal extraction and signal variability determinations. For example, the fluorescence of hybridized labeled molecules may be measured at green light wavelengths and at red light wavelengths, with the intensities produced at each position of the surface of the molecular array at red and green wavelengths corresponding to two different signals. Finally, signal data and variability of signal data are extracted from the reliable regions of each feature and each local background region of the scanned image.
It will also be appreciated throughout the present application that where steps are referenced as being implemented by a computer program, any such steps can also be implemented by hardware or hardware/software combinations which can perform the steps. Also, an xe2x80x9cimagexe2x80x9d in relation to an array is a term which includes data on the position of the features regardless of how such data was obtained (for example, by scanning with a laser beam, or by some other means). Furthermore, wherever a function is referenced, this can for example be a summation. Likewise, wherever summation is referenced, other suitable functions might be used than a simple summation. For example, a weighted summation could be used at those locations on the second pattern which will be superimposed on reference marks or features with expected stronger signals (such as control features) relative to other features, when the second pattern and array are superimposed. When reference marks are present, it will be appreciated that they can be detected and an orientation of the array (such as rotational orientation) can be evaluated based on the detected positions of the reference marks. For example, an approximate indication of array orientation can be based on the detected positions of the reference marks followed by further evaluating a refined array orientation using the above described comparison and re-orientation procedure, such that fewer iterations of the comparison and second pattern re-orientation procedure may be required.