A. Field of the Invention
This invention relates to the field of methods of analysis of multi-dimensional data and more particularly to methods for identifying and classifying discrete populations or clusters within such data. This invention has applications in a variety of disciplines, including the fields of biology, drug discovery and medicine, such as analysis of blood. One specific application described herein is analysis of multi-dimensional data obtained from a flow cytometer in order to identify and classify the data into discrete populations of different types of white blood cells.
B. Related Art
Mammalian peripheral blood usually contains three major classifications of blood cells—red blood cells (“RBCs”), white blood cells (“WBCs”), and platelets (“PLTs”). These cells are suspended in a solution referred to as plasma, which contains many different proteins, enzymes, and ions. The functions of the plasma components include blood coagulation, osmolality maintenance, immune surveillance, and a multitude of other functions.
Mammals usually have anywhere from 2-10×1012 RBCs per liter. RBCs are responsible for oxygen and carbon dioxide transport within the circulatory system. In many mammals, including humans, normal mature red cells have a bi-concave cross-sectional shape and lack nuclei. RBCs can range in diameter between 4 and 9 microns, depending on the species, and have a thickness that is generally less than 2 microns. The RBCs contain high concentrations of hemoglobin, a heme-containing protein which performs the dual roles of oxygen and carbon dioxide transport. Hemoglobin is responsible for the overall red color of blood, due to the presence of iron in the heme molecule. In the present application, the terms “erythrocytes”, “red blood cells”, “red cells”, and “RBCs” are used interchangeably to refer to the hemoglobin-containing blood cells present in the circulation as described above.
In addition to mature RBCs, immature forms of red blood cells can often be found in peripheral blood samples. A slightly immature RBC is referred to as a reticulocyte, and the very immature forms of RBCs are broadly classified as nucleated red blood cells (NRBCs). Higher level non-mammalian animals, such as birds, reptiles, and amphibians, have exclusively nucleated RBCs in their blood.
Reticulocytes are red blood cell precursors that have completed most of the normal red cell development stages in bone marrow, and have expelled their nuclei. The last portion remaining to leave the reticulocyte before it becomes a truly mature RBC is transfer RNA. Detection of reticulocytes is important in clinical evaluation of a patient's ability to produce new red blood cells. The reticulocyte count also can be used to distinguish among different types of anemia. In anemia, red cell production may be diminished to the point where it can no longer keep up with red cell removal, and as a result the overall red blood cell count and hematocrit are low. The presence of an increased number of reticulocytes in anemic patients provides evidence that their bone marrow is functioning, and attempting to make up for the red blood cell deficit. If few or no reticulocytes are detectable in these patients, the bone marrow is not adequately responding to the red blood cell deficit.
White blood cells (also called “leukocytes”) are the blood-borne immune system cells that destroy foreign agents, such as bacteria, viruses, and other pathogens that cause infection. WBCs exist in peripheral blood in very low concentrations as compared to red blood cells. Normal concentrations of these cells range from 5-15×109 per liter, which is about three orders of magnitude less than red blood cells. These cells are generally larger than RBCs, having diameters between 6 to 13 microns, depending on the type of white cell and the species. Unlike RBCs, there are a variety of white blood cell types that perform different functions within the body. In this application, the terms “white blood cells”, “white cells”, “leukocytes”, and “WBCs” are used interchangeably to refer to the non-hemoglobin-containing nucleated blood cells present in the circulation as described above.
Measurements of the numbers of white cells in blood is important in the detection and monitoring of a variety of physiological disorders. For example, elevated numbers of abnormal white blood cells may indicate leukemia, which is an uncontrolled proliferation of a myelogenous or a lymphogenous cell. Neutrophilia, or an abnormally high concentration of neutrophils, is an indication of inflammation or tissue destruction in the body, by whatever cause.
White blood cells may be broadly classified as either granular or agranular. Granular cells, or granulocytes, are further subdivided into neutrophils, eosinophils, and basophils. Agranular white cells are sometimes referred to as mononuclear cells, and are further sub-classified as either lymphocytes or monocytes. Measurements of the percentages in the blood of the two major WBC classifications (granulocytes and mononuclear cells) comprise a two-part WBC differential count (or two-part differential). Measurements of the components of these subclassifications (neutrophils, eosinophils, basophils, lymphocytes, and monocytes), produce a five-part WBC differential count (or five-part differential).
Neutrophils are the most prevalent of the granulocytes and of the five major subclasses of white cells, usually making up a little over half of the total number of white blood cells. Neutrophils are so named because they contain granules within their cytoplasm which can be stained at a neutral pH. These cells have a relatively short life span, on the order of a day or less. Neutrophils attack and destroy invading bacteria and other foreign agents in the tissues or circulating blood as part of the body's immune response mechanisms.
Eosinophils are the second most prevalent of the granulocytes, behind the neutrophils, but generally account for less than five percent of the total number of white blood cells. Eosinophils also contain granules within their cytoplasm which can be stained with an eosin stain. Like neutrophils, these cells are short-lived in the peripheral blood. Eosinophils play a part in the body's immune response mechanisms that are usually associated with allergies or parasitic infections.
Basophils are the least common of the granulocytes, and the least common of all the five classifications of WBCs. As they are granulocytes, they contain granules within their cytoplasm which can be stained, in this case using a basic (high pH) stain. These cells also are known to play a role in the body's immune response mechanisms, but the specifics are not certain.
Lymphocytes are the most prevalent of the mononuclear cell types, and generally make up between 20 and 30 percent of the total number of white blood cells. Lymphocytes specifically recognize foreign antigens and in response divide and differentiate to form effector cells. The effector cells may be B lymphocytes or T lymphocytes. B lymphocytes secrete large amounts of antibodies in response to foreign antigens. T lymphocytes exist in two main forms—cytotoxic T cells, which destroy host cells infected by infectious agents, such as viruses, and helper T cells, which stimulate antibody synthesis and macrophage activation by releasing cytokines. Lymphocytes have no granules in their cytoplasm, and their nucleus occupies a large majority of the cell volume. The thin area of cytoplasm outside the nucleus of lymphocytes can be stained with a nucleic acid stain, since it contains RNA. Many lymphocytes differentiate into memory B or T cells, which are relatively long-lived and respond more quickly to foreign antigen than naive B or T cells.
Monocytes are immature forms of macrophages that, in themselves, have little ability to fight infectious agents in the circulating blood. However, when there is an infection in the tissues surrounding a blood vessel, these cells leave the circulating blood and enter the surrounding tissues. The monocytes then undergo a dramatic morphological transformation to form macrophages, increasing their diameter as much as fivefold and developing large numbers of mitochondria and lysosomes in their cytoplasm. The macrophages then attack the invading foreign objects by phagocytosis and activation of other immune system cells, such as T cells. Increased numbers of macrophages are a signal that inflammation is occurring in the body.
Platelets are found in all mammalian species, and are involved in blood clotting. Normal animals will generally have between 1-5×1011 platelets per liter. These cellular particles are usually much smaller than RBCs, having a diameter between 1 and 3 μm. Platelets are formed as buds from the surfaces of megakarocytes, which are very large cells found in the bone marrow. The megakaryocytes do not themselves leave the marrow to enter the blood circulation; rather, buds form on the surface, pinch off and enter the circulation as platelets. Like RBCs, platelets lack nuclei and thus cannot reproduce. Functionally, platelets aggregate so as to plug or repair small holes in blood vessels. In the case of larger holes, platelet aggregation acts as an early step in clot formation. As a result, platelet count and function are clinically very important. For example, abnormally low platelet counts may be the cause of a clotting disorder.
Collectively, the counting and sizing of RBCs, the counting of WBCs, and the counting of platelets is referred to as a complete blood count (“CBC”). The separation of white blood cells into the five major classifications (i.e., neutrophils, eosinophils, basophils, lymphocytes, and monocytes) and their quantification on a percent basis is referred to as a five-part differential. The separation of white blood cells into two major classifications, granular and agranular leukocytes, and their quantification on a percent basis is referred to as a two-part differential. The categorizing of red blood cells into two classifications, mature red blood cells and reticulated red blood cells, on a percent basis is referred to as a reticulocyte count.
The determination of a CBC, with a five-part differential and a reticulocyte count, is a common diagnostic procedure performed to diagnose, track and treat an abundance of ailments. These tests make up the great majority of hematology analyses that are performed in medical and veterinary clinical laboratories around the world. These three tests have for many years been performed using a microscope, centrifuge, counting chamber, slide, and appropriate reagents. However, the skills necessary to perform these test manually are rare and require years of training. Furthermore, the time required to perform each of these tests manually is very high. As a result, significant automation via instrumentation has been pursued in this field since the early 1950's.
Flow cytometry is a powerful method of analysis that is able to determine the cellular content of various types of samples, and in particular samples that contain living cells. In clinical applications, flow cytometers are useful for lymphocyte counting and classification, for immunological characterization of leukemias and lymphomas, and for cross-matching tissues for transplants. In most flow cytometry techniques, cells in a fluid solution are caused to flow individually through a light beam, usually produced by a laser light source. As light strikes each cell, the light is scattered and the resulting scattered light is analyzed to determine the type of cell. Different types of cells produce different types of scattered light. The type of scattered light produced may depend on the degree of granularity, the size of the cell, etc. Cells in a fluid solution may also be labeled with a marker linked to a fluorescent molecule, which fluoresces when light strikes it and thereby reveals the presence of the marker on the cell. In this fashion, information about the surface components of the cell can be obtained. Examples of such fluorescent molecules include FITC (fluorescein isothiocyanate), TRITC (tetramethyl rhodamine isothiocyanate), Cy3, Texas Red (sulforhodamine 101), and PE (phycoerythrin). In addition, intracellular components of the cell, such as nucleic acids, may be stained by fluorescent compounds, and subsequently detected by fluorescence. Examples of such compounds include ethidium bromide, propidium iodide, YOYO-1, YOYO-3, TOTO-1, TOTO-3, BO-PRO-1, YO-PRO-1, and TO-PRO-1. Cells may also be stained with dyes that label particular cellular components, and the absorbance of the dye bound to the cells measured.
Blood cell measurements made using flow cytometry often require two separate measurements—one to measure the RBCs and platelets, and the other to measure WBCs. The reason for separate measurements is that the RBCs are present in the blood at a much higher concentration than other blood cell types, and thus detection of other cell types in the presence of RBCs requires that the RBCs either be removed or large volumes of sample be measured. Alternatively, these cells may be distinguished on the basis of immunochemical staining of particular cell surface antigens and/or differential cell type staining.
Light scattering measurements are widely used in flow cytometry to measure cell sizes and to distinguish among several different types of cells. It is known that incident light is scattered by cells at small angles (approximately 0.5-20 degrees) from the line traveled by the incident light that interrogates the cells, and that the intensity of the scattered light is proportional to the cell volume. The light scattered at small angles is referred to forward scattered light. Forward scattered light (also called forward light scatter, or small-angle scatter for angles of scatter between 0.5-2.0.degree.) is useful in determining cell size. The ability to measure cell size depends on the wavelength employed and the precise range of angles over which light is collected. For example, material within cells having a strong absorption at the illuminating wavelength may interfere with size determination because cells containing this material produce smaller forward scatter signals than would otherwise be expected, leading to an underestimate of cell size. In addition, differences in refractive index between the cells and the surrounding medium may also influence the small-angle scatter measurements.
In addition to forward scattered light, cells having a high degree of granularity, such as granulocytes, scatter incident light at high angles to a much greater degree than cells with low granularity, such as lymphocytes. Different cell types may be distinguished on the basis of the amount of orthogonal light scatter (also referred to herein as right angle side scatter) they produce. As a result, forward and right angle side scatter measurements are commonly used to distinguish among different types of blood cells, such as red blood cells, lymphocytes, monocytes, and granulocytes.
Additionally, eosinophils may be distinguished from other granulocytes and lymphocytes on the basis of polarization measurements of right angle side scatter. Normally, incident polarized light is scattered orthogonally and remains polarized. However, eosinophils cause incident polarized light scattered orthogonally to become depolarized to a greater degree than other cells. This higher degree of depolarization permits the specific identification of eosinophil populations in blood samples.
Flow cytometers have been commercialized and are known in the art. IDEXX Laboratories, the assignee of this invention, has developed a commercial flow cytometer for analysis of blood which is marketed under the trademark LASERCYTE. Flow cytometers are also described in the patent literature, see for example U.S. Pat. Nos. 6,784,981 and 6,618,143, both assigned to IDEXX Laboratories, the contents of which are incorporated by reference herein. Other patents of interest include U.S. Pat. Nos. 5,380,663; 5,451,525; and 5,627,037.
In conventional hematology instruments, the hemoglobin concentration is generally measured in an otherwise clear solution, and is referenced to a clear fluid. Lysis of red cells allow the hemoglobin to be measured in the same fluidic channel as the white blood cells. Alternatively, on some systems, the hemoglobin content may be measured in a separate channel.
To obtain meaningful information about the numbers and types of cells in a biological sample, or of the concentration of markers on cell surfaces, the samples must be standardized with respect to the amount of light scatter, fluorescence or impedance associated with standardized populations of the cells. In addition, the flow cytometry instrument itself must be calibrated to ensure proper performance. Calibration of the instrument is typically accomplished by passing standard particles through the instrument, and measuring the resulting scatter, fluorescence, or impedance. Flow cytometers may be calibrated with either synthetic standard materials (e.g., polystyrene latex beads) or with cells or other biological material (e.g., pollen, fixed cells, or stained nuclei). These standardization materials are desirably extremely uniform in size, and contain precise amounts of fluorescent molecules to serve in calibrating the photomultiplier tubes used in detection of fluorescent probes. However, the calibration procedures are lengthy and complicated, and require extensive training to perform properly. Consequently, these calibration procedures are typically performed only once at the beginning of the analysis. Changes in the instrument or in the sample may alter the performance of the instrument.
Flow cytometry techniques that took advantage of the light scattering characteristics of cells were applied beginning in the early 1970's to perform white cell differential analysis, in combination with CBC determination. Automated reticulocyte analysis was developed in the 1980's. However, these early systems did not perform a CBC or white blood cell differential. Eventually, manufacturers like Technicon (Bayer), Coulter (Beckman-Coulter) and Abbott incorporated reticulocyte counting with their automated CBC/white cell differential systems, in such high-end hematology systems as the Technicon (Bayer) H*3, Bayer Advia 120™, Coulter STKS™, Coulter GenS™., and Abbott CellDyn 3500 and CellDyn 4000. These high-end instrument systems are capable of measuring all of the parameters for a complete hematology analysis that are clinically important for patient assessment, namely, CBC, five-part WBC differential and reticulocyte count.
The WBC data generated by passing a single blood sample through a flow cytometer consists of N data points, each point captured in a separate channel. Each “channel” is associated with a discrete detector built into the instrument, or, alternatively, an integration of a detector signal over some time period. Thus, the flow cytometer produces N data points in M channels for a data set totalling N×M data points, where M may be 2, 3, 4 or other integer and is equal to the number of detectors in the instrument and whether integration or other processing is used to create more channels than detectors. In the LaserCyte instrument, the instrument captures N seven dimensional data points (M=7). The dimensions are Extinction (EXT), Extinction Integrated (EXT_Int), Right Angle Scatter (RAS), Right Angle Scatter Integrated (RAS_Int), Forward Scatter Low (FSL), Forward Scatter High (FSH), and Time of Flight (TOF). See U.S. Pat. Nos. 6,784,981 and 6,618,143 for details on the geometry of these data collectors and their meanings. The terms “dimensions” and “channels” are used interchangeably in this document. A single seven-dimensional data point is referred to as an “event”.
The physical properties of the different white blood cells cause light passing through them to scatter differently. For example, larger cells generally have greater EXT and EXT_Int values due to their greater light occlusion, while cells with greater internal complexity tend to produce greater light scatter and this is observed at the FSH detector.
The human eye can distinguish data clumps or clusters (“populations”) amongst some two-dimensional projections of the seven-dimensional event data, e.g., a conventional 2D plotting of the N event data with the EXT value being in positive Y axis and the RAS value plotted the positive X axis. Moreover, it has been shown that on clean, well-handled samples, the percentage of observed events within each cluster typically corresponds to the relative percentages of the five different white blood cell types (neutrophils, monocytes, lymphocytes, eosinophils, and basophils). However, there is a need for quantifying such populations with some precision, preferably in an automated manner, as quantitative measurements provide a more meaningful way to measure and compare the populations and therefore use them for diagnostic or other analytical purposes.
The solution provided by this disclosure is a method and apparatus for finding and classifying, in an automated fashion, event data in the midst of noise and to give estimates, in quantitative terms, on the relative frequencies of populations in a multidimensional data set, such as for example, frequencies of WBC type in a given sample of human or animal blood. This is no small feat. The sample-to-sample and machine-to-machine variability, combined with varying degrees of noise resulting from unknown cellular events, greatly complicate this classification problem. The art has lacked a robust analysis method that offers the ability to combine expert knowledge with stable unsupervised classifying and classifying algorithms for identifying discrete populations (clusters) of data within a large multidimensional data set, e.g., as obtained by a flow cytometer.
The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the drawings.