Presently there are a number of methods for analyzing multidimensional data. Typically, these methods rely on some form of mathematical modeling which is used to present, rearrange or transform the data in such a way that observations may be made and conclusions drawn. Such methods include for example, multivariate analysis, regression analysis, and logarithmic transformation. Although there are a number of methods to analyze multidimensional data, the methods have certain limitations and shortcomings. For example, when using visualization to analyze such data, the viewing of multidimensional data as combinations of lower dimensional views (e.g., histograms and bivariate displays) results in the loss of information in an uncontrollable manner. Such loss of information frequently leads to loss of the pertinent information necessary to accomplish desired goals, e.g., understanding the differentiation and maturation of cells when analyzing cytometry data. While such methods generally are not dependent upon the manner of data collection, in many instances those who collect data by a particular method routinely use the same form(s) of data analysis and presentation. This is especially true in the field of cytometry.
Cytometry is a technique for the simultaneous analysis of multiple physical and chemical characteristics of individual particles either suspended in a liquid medium (flow cytometry) or coated on slides (image cytometry). Typically cytometry is used for the analysis of the physical and chemical characteristics of individual cells, although any type of particle can be measured by this technology. The term “cell” will refer to a particle that is measured by a cytometer. Cytometry uses the principles of light scattering, light excitation, cell volume, and emission of fluorochrome molecules to generate specific multi-parameter data from a single cell. The process of collecting correlated data from a population of cells using a cytometer is summarized in FIG. 1. The cells are normally collected as a tissue specimen and then divided into one or more samples. The samples can be contained either in a fluid volume (FIG. 1a, 12a) or on a slide (FIG. 1b, 12b).
The sample is normally reacted with reporter molecules or stains that confer information about the quantity of specific types of cellular structures or molecules either on or in the cell. The cocktail of stains along with the cell's intrinsic properties allow different cell types to be partially or completely delineated from each other in subsequent analyses. Stains can include one or more fluorescent labeled monoclonal antibodies, nucleic acid specific fluorescent components, fluorescent lipophilic molecules, viable cell dyes, etc. The intrinsic properties of the cells include, but are not limited to, forward angle light scattering (FALS), side scatter (SS) and cell volume.
In FIG. 1, the differently filled circles 10a, 10b, 10c represent three different cell types that are present for this example population. Cell types are known types of cells that are defined by a priori information from previous studies and/or the data collected by the cytometer. These cell types include, but are not limited to, tissue culture cell-lines, specific lineage-specific blood or bone-marrow cells, bacteria and fungi.
After staining, the cells are processed by a cytometer as illustrated in FIG. 1c. Depending on the staining cocktail, the intrinsic cellular properties, and the instrument's capability; various parameters are detected, digitized, and stored. This stored data structure is referred to as a listmode file. Typically, listmode files organize stored data such that measured parameters are in columns and events are in rows. Commonly, the number of events generated during cytometry analysis is in the range from about 10 to about 20,000. However, in certain instances the number of events can be in the millions.
The listmode correlated parameters are normally displayed as one-parameter (1P), two-parameter (2P), or three-parameter (3P) plots. (See FIG. 2.) Using techniques such as principal components, a higher number of correlated parameters (3+P) can also be displayed.
Analysis of multidimensional data, such as cytometer data generally involves classifying the data into relevant populations by using some set of hierarchical or refinement gates. A gate is either a one-dimensional range or a two-dimensional parameter boundary where events can either be inside or outside the boundary. Gates can also be combined into Boolean algebraic expressions. Events that satisfy these gates can then be displayed in other histograms defined by other parameters. This gate refinement process is carried out until the populations of interest are displayed on some set of histograms. Statistical analyses of these gated events along with all or some of the graphics are generally the ultimate output of conventional multidimensional data analysis, including cytometry analysis.
This prior art approach to analysis of multidimensional data, however, has several limitations, including parameter scalability, gating errors, and multiple sample data integration and visualization. The number of two-parameter histograms necessary to represent m-dimensional data encoded in a listmode file is m*(m−1)/2. For example, examining a sample with a cytometer that generates ten parameters requires 45 separate two-parameter histograms and a visual understanding of the implicit relationships between the parameters is very difficult. Thus, prior art approaches do not scale well with number of parameters.
The prior art method of analyzing multidimensional data is also limited by compounding gating errors. A gate is a boundary that attempts to contain one or more cellular populations, however, populations are actually m-dimensional probability distributions. Attempting to separate these distributions with simple boundaries can result in false positives contained in the gate and false negatives excluded by the gate. Since the output of one gate is used for the definition of another, these errors are compounded as the number of parameters increase. Compounding gating errors are further exacerbated by subjective placement of the gates.
The prior art method of analyzing multidimensional data is further limited in its ability to analyze data from multiple samples from a single specimen. Different cocktails of stains are often used to look at a single specimen in different ways. It is difficult to correlate the information from the separate analyses so that a single coherent picture of the specimen emerges.
Accordingly, there is a need in the art for improved methods of analyzing multidimensional data and related systems for analyzing and/or displaying data, including multidimensional data generated during cytometry analysis.