1. Field of the Invention
The field of this invention is biopolymeric arrays, and particularly image analysis of biopolymeric arrays.
2. Background of the Invention
Biopolymeric arrays, e.g., nucleic acid arrays, are increasingly important tools in the life science research and related fields, both in industry and academia. While significant advances in array design have been made over the last decade, processing of array images continues to be a challenge.
A variety of software tools and protocols have been developed for use in processing array images. The basic goal of such protocols is to reduce an image of spots of varying intensities into a table with a measure of the intensity (or the ratio of intensities for multi-colored fluorescence images) for each spot. While these goals are straightforward, there is no common method for obtaining these goals. Furthermore, scanning and image processing protocols currently available are resource intensive, and often require human intervention to properly grid the images and flag features that should be excluded from subsequent analysis, e.g., features that exceed a heterogeneity threshold.
With respect to flagging of features for analysis exclusion, one reason to exclude such features is feature heterogeneity. The problem of feature heterogeneity affects all analytical methods that are based upon detecting and reporting the signal of a region of interest, such as signals from a feature from a nucleic acid array, e.g., an oligonucleotide or cDNA array. Bright pixels in an otherwise low signal feature lead to overestimation of the signal. Dark pixels (e.g., from scratches) in an otherwise high signal feature lead to underestimation of the signal. Features that have a high degree of heterogeneity also yield signals that have a low degree of confidence, where the intra-feature or feature inter-pixel standard deviation of the signal is very high.
Many currently employed image analysis protocols use local background regions for background subtraction of the features on the array. The use of a local background region that is contaminated with high signal pixels leads to overestimating the background and underestimating the net signal of features. These problems can occur where either a 1:1 local background:feature or a global statistical value is employed.
An approach currently employed to identify heterogeneous features is manual curation of the image. In manual curation of an image, a user views the scanned image of an array and either notes individual feature numbers or positions or uses customized software tools to mark the features as xe2x80x9cbadxe2x80x9d so that down-stream data analysis will see the features as flagged and adjust its use accordingly. Manual curation suffers from the fact that it is highly subjective and unwieldy for arrays of high feature counts.
As such, there is continued interest and need for the development of new methods for identifying features in an image of an array as heterogeneous. Of particular interest would be the development of such a method which could be performed automatically without human intervention to consistently identify heterogeneous features in an array image, where the method was suitable for processing images obtained for nucleic acid and other biopolymeric arrays.
Relevant Literature
Bassett et al., Nature Genetics Supp. (January 1999) 21: 51-55, provides a review of the problems of array image processing. Patents of interest include: U.S. Pat. Nos. 5,143,854; 5,631,734 and 5,981,956. See also WO 92/10092.
Methods are provided for identifying heterogeneous features, including heterogeneous background features, in an image of an array, e.g., in an image of a biopolymeric array, such as a nucleic acid array. The subject methods employ an algorithm that employs a different dispersity measure depending on whether the signal features are weaker or stronger. In the subject methods, a toggle parameter, e.g., a single value (i.e., toggle point) or range of values (i.e., toggle range, smooth function), for the array of features is first determined. The toggle parameter is determined using statistics obtained from low signal features on the array. Following determination of the toggle parameter, those features that have a signal intensity that is either: (a) equal to or less than the toggle parameter and have an intra-feature noise metric 1 level, e.g., standard deviation, that exceeds the intra-feature noise limit for metric 1; or (b) greater than the toggle point and have an intra-feature noise metric 2 level that exceeds the intra-feature noise limit for metric 2, e.g., coefficient of variation; are identified as heterogeneous. Also provided are computer readable storage media that include an algorithm capable of performing the steps of the subject methods. The subject methods find use in the processing of images obtained from a variety of different types of arrays, including nucleic acid arrays.