Flow cytometry is a technology that is used to analyze the physical and chemical properties of particles in a sheath fluid as it passes through one or more lasers. Most commonly cell particles are fluorescently labelled and then excited by the laser to emit light at varying wavelengths.
A flow cytometer has five main components. First, a fluid sheath is formed that carries and aligns cells so that they may pass single file through a light beam for sensing. Second, a measurement system, usually a laser beam in which the beam is aligned with the fluid sheath. Third, a detector and an analog to digital conversion system, which converts the analog measurements of forward-scattered light and side-scattered light from the laser beam into digital signals that can be processed by a computing system. Fourth, an amplification system, linear or logarithmic. Lastly, a computer system for analysis of the signals.
Flow cytometry is used in the diagnosis of health disorders, in particular blood cancers, but also has other applications in chemical research and analysis of properties of particles. Flow cytometers are capable of analyzing thousands of particles every second, and can actively separate and isolate particles having specific physical or chemical properties. The process of collecting data from samples using flow cytometry is called acquisition. Acquisition is mediated by a computer physically connected to the flow cytometer and receiving digital or analog signals from the flow cytometer. Modern flow cytometers typically have multiple lasers and detectors incorporated to detect multiple antibodies or markers on the surface of the particles.
Multi-parameter flow cytometry immunophenotyping, a technique used to study the protein expressed by cells, has become a method of choice for the differential diagnosis of reactive and neoplastic blood disorders. As known in the art, samples are tagged with monoclonal antibodies coupled to fluorescent labels in order to characterize the different cellular populations in a wide range biological specimens, including blood, bone marrow tissue, bodily fluids, and lymph nodes. Additionally, other flow cytometry phenotyping systems are known and disclosed, examples of such are quantum dot or isotope labeling, others are available and will occur to those of skill in the art.
Typically, several tens of thousands of cells are measured within a few seconds and stored as digital data. This data set represents an empirical look at the contents of the specimen based upon the biological markers encoded for.
Genetic Algorithms, examples of which are generally disclosed in U.S. Pat. Nos. 4,935,877, 5,255,345, 5,390,283, among others, which are incorporated herein by reference, are utilized to analyze flow cytometry data provided by conventional flow cytometry data collection methods.
GENIE™ is a Genetic Algorithm software system that accounts for rapidly evolving feature extraction algorithms for image analysis. Historically GENIE™ is utilized for feature extraction on aerial topographical and satellite imagery. GENIE™ is an evolutionary computation software system that uses a genetic algorithm to assemble image-processing tools from a collection of low-level image operators, such as edge detectors, texture measures, spectral operations, and other morphological features. Each image processing tool generates a number of feature planes, which are then combined using a supervised classifier (Fisher linear discriminant) to generate a final boolean feature mask. A population of image processing tools is generated, ranked according to a fitness metric measuring their performance on some user-provided training data, and fit members of the population permitted to reproduce.
GENIE™ uses several standard fitness metrics including Euclidean distance and Hamming distance. The process cycles until the population converges to a solution, or the user decides to accept the current best solution. GENIE may choose to ignore the spatial information in the image and rely wholly on spectral operations and the supervised classifier module, but in practice GENIE will construct integrated spatio-spectral algorithms.
GENIE™ system architecture is designed to provide flexible and powerful computing paradigm. GENIE™ can search a rich and complex feature space using its gene pool of standard primitive image processing operators and the results of additional analyst-selected algorithms.
Machine learning teaches computers to learn from experiences. Machine learning algorithms use computational methods to learn information directly from data without relying on a predetermined equation as a model. The algorithms associated with machine learning improve their performance as the number of samples available for learning increases.
Typically, machine learning algorithms find natural patterns in data that generate insight and help make better decisions and predictions. Machine learning is used to make critical decisions in medical diagnosis, stock trading, energy load forecasting, and many more applications in which prediction from samples is possible.
Machine learning uses two types of techniques. Supervised learning is one type that trains a model on known input and output data so that it can predict future outputs. Unsupervised learning, the second type, finds hidden patterns or intrinsic structure within input data.
The goal of supervised machine learning is to build a model that makes predictions based on evidence in the presence of uncertainty. A supervised learning algorithm that a known set of input data and known responses to the data and trains a model to generate reasonable predictions for the response to new input data.
The goal of unsupervised learning is to find hidden patterns or intrinsic structure in data. It is used to find inferences from datasets consisting of input data without labeled responses. Clustering is the most common unsupervised learning technique. It is used for exploratory data analysis to find hidden patterns or groupings in data.
A genetic algorithm is a search heuristic that mimics the process of natural selection. Genetic algorithms use methods such as mutation and crossover to generate new genotype in the hope of finding good solutions to a given problem. Most recently machine learning techniques are used to improve the accuracy and performance of genetic algorithms.
Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. Given a set of training examples each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.
An advancement of the current invention over traditional SVM analysis of flow cytometry data is the ability to measure multiple features or markers in one step without the need of pre-analytical feature selection operations in order to reduce dimensionality. It is well known that SVM's experience decreasing performance in supervised learning of data sets as the number of vectors or features increase. That problem often requires the addition of feature selection methods such as leave-one-out error and Kernel alignment. Genetic algorithms provide the ability to readily analyze multiple features or markers in a single training session, also allowing for a better understanding and prediction of multiple parameter data set prediction.