1. Field of the Invention
The present invention relates generally to methods for locating clusters in multidimensional data. The present invention is particularly applicable to identifying clusters corresponding to populations of cells or particles in data generated by cytometry, more particularly, flow cytometers.
2. Description of the Related Art
Particle analyzers, such as flow and scanning cytometers, are well known analytical tools that enable the characterization of particles on the basis of optical parameters such as light scatter and fluorescence. In a flow cytometer, for example, particles, such as molecules, analyte-bound beads, or individual cells, in a fluid suspension are passed by a detection region in which the particles are exposed to an excitation light, typically from one or more lasers, and the light scattering and fluorescence properties of the particles are measured. Particles or components thereof typically are labeled with fluorescent dyes to facilitate detection, and a multiplicity of different particles or components may be simultaneously detected by using spectrally distinct fluorescent dyes to label the different particles or components. Typically, a multiplicity of photodetectors, one for each of the scatter parameters to be measured, and one for each of the distinct dyes to be detected. The data obtained comprise the signals measured for each of the light scatter parameters and the fluorescence emissions.
Cytometers further comprise means for recording the measured data and analyzing the data. For example, typically, data storage and analysis is carried out using a computer connected to the detection electronics. The data typically are stored in tabular form, wherein each row corresponds to data for one particle, and the columns correspond to each of the measured parameters. The use of standard file formats, such as an “FCS” file format, for storing data from a flow cytometer facilitates analyzing data using separate programs and machines. Using current analysis methods, the data typically are displayed in 2-dimensional (2D) plots for ease of visualization, but other methods may be used to visualize multidimensional data.
The parameters measured using a flow cytometer typically include the excitation light that is scattered by the particle along a mostly forward direction, referred to as forward scatter (FSC), the excitation light that is scattered by the particle in a mostly sideways direction, referred to as side scatter (SSC), and the light emitted from fluorescent molecules in one or more channels (range of frequencies) of the spectrum, referred to as FL1, FL2, etc., or by the fluorescent dye that is primarily detected in that channel. Different cell types can be identified by the scatter parameters and the fluorescence emissions resulting from labeling various cell proteins with dye-labeled antibodies.
Both flow and scanning cytometers are commercially available from, for example, BD Biosciences (San Jose, Calif.). Flow cytometry is described at length in the extensive literature in this field, including, for example, Landy et al. (eds.), Clinical Flow Cytometry, Annals of the New York Academy of Sciences Volume 677 (1993); Bauer et al. (eds), Clinical Flow Cytometry: Principles and Applications, Williams & Wilkins (1993); Ormerod (ed.), Flow Cytometry: A Practical Approach, Oxford Univ. Press (1997); Jaroszeski et al. (eds.), Flow Cytometry Protocols, Methods in Molecular Biology No. 91, Humana Press (1997); and Shapiro, Practical Flow Cytometry, 4th ed., Wiley-Liss (2003); all incorporated herein by reference. Fluorescence imaging microscopy is described in, for example, Pawley (ed), Handbook of Biological Confocal Microscopy, 2nd Edition, Plenum Press (1989), incorporated herein by reference.
The data obtained from an analysis of cells (or other particles) by multi-color flow cytometry are multidimensional, wherein each cell corresponds to a point in a multidimensional space defined by the parameters measured. Populations of cells or particles are identified as clusters of points in the data space. The identification of clusters and, thereby, populations can be carried out manually by drawing a gate around a population displayed in one or more 2-dimensional plots, referred to as “scatter plots” or “dot plots, of the data. Alternatively, clusters can be identified, and gates that define the limits of the populations, can be determined automatically. A number of methods for automated gating have been described in the literature. See, for example, U.S. Pat. Nos. 4,845,653; 5,627,040; 5,739,000; 5,795,727; 5,962,238; 6,014,904; 6,944,338, each incorporated herein by reference.
Mixed model approaches to identifying clusters in data that correspond to populations have been described. A mixed model approach to clustering is based on modeling the data as a finite mixture of distributions, in which each component distribution is taken to correspond to a different population. Most commonly, the component distributions are assumed to be a multivariate Gaussian (normal) distributions or t distributions. One methodology for fitting the mixture of distributions to the data consists of using an expectation-maximization (EM) algorithm to estimate the parameters of the distributions corresponding to the clusters. Each event (data from a single cell or particle) is classified as a member of the population cluster to which it is most likely to belong. The use of multivariate mixture modeling for the gating of data generated by flow cytometry has been described in, for example, Boedgheimer et al., Cytometry 73A: 421-429, 2008; Chan et al., Cytometry 73A: 693-701, 2008; and Lo et al., Cytometry 73A: 321-332, 2008, each incorporated herein by reference. More generally, the use of pattern recognition to identify populations is described in Boddy et al., Cytometry 44:195-209, 2001, incorporated herein by reference.