High content, quantitative analysis of microscopy images is an increasingly important tool for applications in drug discovery, basic research and medical diagnosis. We define image based, high content analysis to mean the measurement of multiple image parameters per cell or subcellular compartment or objects, across multiple cells in an image, or across multiple images. This could be done automatically in a high volume and high throughput manner or in a research setting that involves few cells or images in a semi-automatic fashion. High content analysis of these assays has only become practical in drug discovery and medical diagnosis in recent years, and is currently being adopted in basic research.
Prior to the advent of high content screening systems, prior art approaches in cell based screening only analyzed a single average fluorescent response of many hundreds of cells in a biological sample, usually contained in a microtiter well. A popular assay instrument that uses this approach is the Molecular Devices FLIPR (www.moleculardevices.com). High content screening tools in drug discovery have been deployed since the late 1990s. These individual cell based assays provide researchers with large amounts of biological and chemical information, and they offer important enhancements to information obtained through traditional high throughput screens. High content assays have to date been mostly deployed to screen chemical compounds against biological targets (usually receptors) genetically over-expressed in cell culture. More recently, high content assays have been increasingly adopted in target discovery; an important and popular application is RNA interference (RNAi) assays. The same imaging equipment and image informatics can be used in either case. High content analysis enables the measurement of complex and biologically important phenotypes that could not be measured in HTS, such as morphology changes, cellular differentiation, cytoskeletal changes, cell to cell interactions, chemotaxis and motility, and spatial distribution changes like receptor trafficking or complex formation.
Recently, high content analysis has become vital to cell culture automation, which has been identified as a critical bottleneck in both high content and high throughput screening. Here cell image analysis could be adapted to measure cells in microplates, count the cells, measure the confluence of cells, and the purity of cell culture (single or multiple clones). An example of this is a recent collaboration announced between MAIA Scientific and The Automation Partnership (“TAP Taps MAIA Scientific's Imaging System to Enable Automated Cell Culture for Well Plates” in Inside Bioassays Vol. 1(4) pg 1-5) to add Maia's image analysis software to the Cello automated cell culture system.
Chemical compound screening and RNAi based protein screening are accelerating the adoption of high content image based analysis in academic and basic research settings. Of course, microscopy has long been a benchtop tool for biologists, but until recently acquiring images using camera and analysis of those images has typically been low volume, low throughput, semi automatic with manual Region Of Interest (ROI) drawing and application of simple measurement tools included with standard digital microscopy software packages such as Universal Imaging's Metamorph, NIH Image, and MediaCybernetics' ImagePro. This appears to be changing as the NIH makes a strong push into chemical compound screening for academics. The Molecular Libraries and Molecular Imaging initiative (http://nihroadmap.nih.gov/molecularlibraries/index.asp) is a key component of the new NIH Roadmap (Zerhouni in Science Vol. 302(3) pg. 63-64 and 72, October 2003) and will offer public sector biomedical researchers access to small organic molecules which can be used as chemical probes to study cellular pathways in greater depth. It is intended for these assays to make use of high content and high throughput screening approaches, and NIH funding will likely favor researchers who adopt these types of tools. Probably a guiding case for the MLMI initiative, the NCI funded Harvard Institute for Chemistry and Cell Biology Initiative for Chemical Genetics (Stuart Schreiber: biology from a chemist's perspective in DDT Vol. 9(7) April 2004, pg. 299-303) has been using high content analysis of chemical compound screens for some time. They use chemicals in an analogous way to mutations, to dissect cellular pathways and identify previously unknown pathway components.
Very recently, RNAi has been validated as a platform technology for the analysis of protein function, and these assays benefit immensely from high content analysis to interpret the phenotypic changes of a sample subject to genetic perturbation (Carpenter, Sabatini, SYSTEMATIC GENOME-WIDE SCREENS OF GENE FUNCTION, in Genetics Vol. 5 pg. 11-22, January 2004). In the near future, genome wide screens will be commonplace. Several consortia (Netherlands Cancer Institute/Cancer Research UK, Vienna's Research Institute of Molecular Pathology/EMBL/Sanger Institute, Cold Spring Harbor Laboratories, and the RNAi consortium) have announced plans to make RNAi collections for the entire human genome. The Sloan-Kettering Institute and GE Healthcare have recently begun a collaboration to develop a technology capable of scanning the entire human genome in one day to analyze the function of each of the bodies 35,000 genes in a cellular process (see www.amersham.co.uk/investors/IR03/rep-4.html). This gene scanning technology will depend heavily on high content analysis software disclosed in “Harris et al. US Patent Application no. 2003/0036855 Method and Apparatus for Screening Chemical Compounds”. Gene scanning will be made available to the broad academic community via a low-end hardware and optics platform that uses the same high content analysis software, a trend that indicates the growing importance of analytical software relative to hardware and optics platforms that are becoming commoditized.
There are many prior art approaches of cell analysis. “Lee, Shih-Jong J. U.S. Pat. No. 5,867,610 Method for Identifying Objects Using Data Processing Techniques, February 1999” discloses a method for the analysis of images of cervical Pap smear slides that enabled the first fully automated and FDA approved Pap smear screening device. In drug discovery, high content screening systems utilize advanced fluorescence light-microscopy and molecule specific fluorescent-protein tags to directly examine the physiology of fixed and living cells. Leading examples of state of the art devices are disclosed in “Harris et al. US Patent Application no. 2003/0036855 Method and Apparatus for Screening Chemical Compounds” and “Dunlay et al. U.S. Pat. No. 5,989,835 System for Cell Based Screening, November 1999”.
The de facto standard for measuring assay quality in high throughput and high content screens is the z factor, disclosed in “Zhang et al, A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays, in Journal of Biomolecular Screening Vol. 4(2) pg. 67-73, 1999”. Recently, it has been proposed that the Z factor also be used as a measure of quality for the new screens of RNAi induced phenotypes as well (Carpenter, Sabatini, SYSTEMATIC GENOME-WIDE SCREENS OF GENE FUNCTION, in Genetics Vol. 5 pg. 11-22, January 2004). It is reasonable to assume that the Z factor will see widespread use in academia as high throughput, high content assays are adopted.
The Z factor measures the assay signal window with a dimensionless parameter. The signal window can be thought of as the separation band between the distribution of test samples and that of control samples. This window is important to reduce false positive and false negative results. The Z factor is defined as:
  Z  =      1    -                  (                              3            ⁢                                                  ⁢                          σ              s                                +                      3            ⁢                                                  ⁢                          σ              c                                      )                                                  μ            s                    +                      μ            c                                      where σs and σc indicate the standard deviation of the sample and control populations respectively, and μs and μc indicate the mean of the sample and control populations respectively. As discussed in Zhang et al., the Z factor is sensitive to both data variability and the signal dynamic range. For example, as (3σs+3σc) approaches zero (very small standard deviations), or as |μs−μc| approaches infinity (large signal dynamic range), the Z-factor approaches 1, and the HTS assay approaches an ideal assay. Typically, an excellent assay is one that has a Z factor score greater than 0.5.
Assay development can be thought of as an exercise in optimization of many assay inputs to deliver the highest possible Z factor either by increasing signal range or reducing variation. There are many potential sources of variation, though scientists tend to focus on biological variation rather than instrument variation as that is what they can directly control. Sources of biological variation include subtle differences in cells resulting from cell culture variation, differences in DNA transfection across cells, variation in imaging probe titer and probe characteristics (such as rate of dissipation) across cells, errors in liquid handling, and poor cell adhesion. Furthermore, high content measurements can be confounded by compound related artifacts that can cause false positives and false negatives; such as fluorescent compounds, toxic compounds and rare morphological changes that affect the biological signal on which the assay is based.
Indeed, the evaluation of high content assay quality is fundamentally different than that of HTS assay quality because the sample unit is different. In HTS the sample is a single fluorescent measurement corresponding to microtiter well. In a high content assay, the sample is a biological object upon which a measurement or set of measurements, including combined and higher order measurements, are made using high content image analysis. There can be hundreds of objects in a FOV, and many FOVs per well, slide or cell array. Thus, high content analysis introduces a new source of variation into the measurement of assay quality: image analysis.
To date there has been no discussion in the literature or marketplace about how robust methods can be applied to high content analysis to both reduce measurement variation and increase the signal strength. It would be greatly beneficial to the field if robust methods could be deployed that yield a high quality assay while allowing the same or even more variation in assay inputs. This is possible in high volume, high throughput, microscopy image based assays because the high content image analysis plays a direct role in establishing both the signal dynamic range and the population variation.
Fundamentally, high content image analysis techniques can be used to reduce measurement variation at the sample level. Current state of the art approaches have in common the production of a binary mask. A binary mask image is a 1 bit image composed of ones (foreground) and zeros (background). The binary mask image corresponds to an input image of a high content assay wherein image segmentation has been applied. Image segmentation is the association of pixels to biological objects (e.g. cells or subcellular components). In the binary mask image the white areas (filled with ones) correspond to objects, and the black areas (filled with zeros) corresponds to the background. Object based measurements are carried out using the original input image within the region defined by the binary masks or their surrounding regions often subject to adjustments such as a correction for the non-uniform response of the imaging system across the field of view or transformation from intensity value to optical density. Common object based measurements include total intensity, average intensity, and standard deviation of intensity within the object region. Many other morphological features such as shape, texture and color measurements can also be made.
As described in “Harris et al. US Patent Application no. 2003/0036855 Method and Apparatus for Screening Chemical Compounds, the basic cell mask can be used to take measurements of nuclear and cytoplasmic activity. One example is for a two image fluorescent assay wherein one image corresponds to an emission filter channel that displays a Hoechst nuclear marker and a second image corresponding to a fluorescent reporter molecule describing some biological activity located in the cytoplasm. Object masks can be created by a simple threshold based segmentation algorithm applied to the Hoechst image, thus each object corresponds to the a cell nuclear region as the intensity in the Hoechst image displays only intensity located in the cell nucleus. An erosion image processing operation can be applied to these masks to create the nuclear mask. These masks can be used to measure the nuclear intensity in the corresponding regions of the Hoechst image. Next to measure cytoplasmic activity in the second image, a mask to represent the cytoplasm area must be created. To do this a dilation operation using preset parameters is applied to the original binary mask image, and areas that were one (1) in the original mask area are set to zero (0). The result is a donut shaped mask, these masks are used to measure cytoplasmic intensity in the corresponding regions of the second fluorescent image.
A similar method is disclosed in “Dunlay et al. U.S. Pat. No. 5,989,835 System for Cell Based Screening” and two examples of determining nuclear translocation of a DNA transcription factor are discussed. Firstly, an unstimulated cell with its nucleus labeled with a blue fluorophore and a transcription factor in the cytoplasm labeled with a green fluorophore. Secondly, the nuclear binary masks are created by performing cells segmentation on the fluorescent image corresponding to the blue fluorophore. The cytoplasm of the unstimulated cell imaged at a green wavelength. The nuclear mask is eroded (reduced) once to define a nuclear sampling region with minimal cytoplasmic distribution. The nucleus boundary is dilated (expanded) several times to form a ring that is 2-3 pixels wide that is used to define the cytoplasmic sampling region for the same cell. Using the nuclear sampling region and the cytoplasmic sampling region, data on nuclear translocation can be automatically analyzed by high content analysis on a cell by cell basis.
Binary mask based high content measurements introduce error into the assay at an early stage, in addition to instrument error such as focusing errors and variation in illumination. Types of measurement error are shown in FIG. 1A-4H. FIG. 1A-1D show errors in measurement on the nuclear image. The dark regions 104, 106 are the binary masks resulting from segmentation. The true nuclear regions 100, 102 are highlighted in checker patterns. Measurement errors result from segmentation errors that include over-segmentation (FIG. 1A), under-segmentation (FIG. 1B), missed segmentation (FIG. 1C) and overlapped segmentation (FIG. 1D). As described above, the nuclear masks 104, 106, 108 are used to derive cytoplasm rings 112, 114, 116 within which measurements are made on the cytoplasm regions 110, 118. FIGS. 1E-1H show how errors in measurements on the cytoplasm image accumulate from the initial segmentation errors made when creating the nuclear masks 104, 106, 108. The cytoplasmic rings 112, 114, 116 are shown in dark black overlain on the representation of the true nuclear 100, 102 (checker patterns) and cytoplasm regions (dotted patterns). As disclosed above, cytoplasmic region measurements are meant to measure the fluorescent activity of fluorophores in the cytoplasm, however types of common measurement errors include measuring both the true cytoplasm and true background intensities within the cytoplasm ring region 112 (FIG. 1E), measuring intensities corresponding to true cytoplasm, true background and true nuclear regions within the cytoplasm ring region 114 (FIG. 1F), missing the object altogether, and the cytoplasm ring region 116 measuring the cytoplasm intensity of two cells and treating it as one (FIG. 1H). This error is again accumulated and undermines derived measurements such as the standard deviation of intensity, the ratio of cytoplasmic to nuclear intensity, etc.
Similar error is accumulated in time lapse images when objects are not perfectly aligned from frame to frame. Error is introduced when the nuclear object reference mask and the true nuclear object shift over time. As the nucleus shifts from image frame to image frame, the measurement region corresponding to the initial binary mask increasingly includes background fluorescence in its measurement.
These fundamental errors in object segmentation and measurement are propagated throughout the assay's statistics resulting in higher assay variability and reduced signal dynamic range. Additional variation is introduced by instrument and biological variation. It is clear then that there is a need for robust methods of high content analysis that allow for a more accurate segmentation result, and more specific and sensitive measurements with high repeatability. These robust measurements are needed not only at the individual object level, but also at the FOV level, the sample level (usually corresponding but not limited to a microtiter plate well or slide bound tissue specimen or micro tissue array) and the assay level.