The ability to determine the composition of chemical constituents in a complex mixture has a broad range of highly useful applications, including answering questions posed by traditional chemical analysis, such as “What is this substance made of?”, and enabling more sophisticated analysis of biological processes, such as “How is a healthy cell different from a diseased cell?”, “How does this medicine affect the cellular process?”, “How can the growth of cells in culture be optimized?”, and “What is the limiting factor for this bioprocess?”.
The techniques traditionally used in analysis of complex mixtures include chromatography and mass spectrometry. Chromatography is a technique whereby a complex mixture is separated into parts. Mass spectrometry is a technique in which a sample containing many different chemical constituents is ionized, and the ionized chemical constituents are subjected to an electromagnetic field, which separates the chemical constituents according to their mass-to-charge (m/z) ratios. Although both chromatography and mass spectrometry separate a complex mixture into constituent parts, neither technique provides direct identification of the chemical constituents; the identity of a chemical constituent must be determined based on an analysis of the measured characteristics of the chemical constituent.
As used herein, the term “separation” refers to the process of separating a complex mixture into its component molecules or metabolites. Common laboratory separation techniques include electrophoresis and chromatography.
As used herein, the term “chromatography” refers to a physical method of separation in which the components (i.e., chemical constituents) to be separated are distributed between two phases, one of which is stationary (stationary phase) while the other (the mobile phase) moves in a definite direction. Chromatographic output data may be used for manipulation by embodiments of the subject matter described herein.
As used herein, the term “retention time”, refers to the elapsed time in a chromatography process since the introduction of the sample into the separation device. The retention time of a constituent of a sample refers to the elapsed time in a chromatography process between the time of injection of the sample into the separation device and the time that the constituent of the sample elutes (e.g., exits from) the portion of the separation device that contains the stationary phase.
As used herein, the term “retention index” of a sample component refers to a number, obtained by interpolation (usually logarithmic), relating the retention time or the retention factor of the sample component to the retention times of standards eluted before and after the peak of the sample component, a mechanism that uses the separation characteristics of known standards to remove systematic error.
As used herein, the term “separation index” refers to a metric associated with chemical constituents separated by a separation technique. For chromatographic separation techniques, the separation index may be retention time or retention index. For non-chromatographic separation techniques, the separation index may be physical distance traveled by the chemical constituent.
As used herein, the terms “separation information” and “separation data” refer to data that indicates the presence or absence of chemical constituents with respect to the separation index. For example, separation data may indicate the presence of a chemical constituent having a particular mass eluting at a particular time. The separation data may indicate that the amount of the chemical constituent eluting over time rises, peaks, and then falls. A graph of the presence of the chemical constituent plotted over the separation index (e.g., time) may display a graphical peak. Thus, within the context of separation data, the terms “peak information” and “peak data” are synonymous with the terms “separation information” and “separation data”.
As used herein, the term “Mass Spectrometry” (MS) refers to a technique for measuring and analyzing molecules that involves ionizing or ionizing and fragmenting a target molecule, then analyzing the ions, based on their mass/charge ratios, to produce a mass spectrum that serves as a “molecular fingerprint”. Determining the mass/charge ratio of an object may be done through means of determining the wavelengths at which electromagnetic energy is absorbed by that object. There are several commonly used methods to determine the mass to charge ratio of an ion, some measuring the interaction of the ion trajectory with electromagnetic waves, others measuring the time an ion takes to travel a given distance, or a combination of both. The data from these fragment mass measurements can be searched against databases to obtain identifications of target molecules. Mass spectrometry is also widely used in other areas of chemistry, like petrochemistry or pharmaceutical quality control, among many others.
As used herein, the term “mass analyzer” refers to a device in a mass spectrometer that separates a mixture of ions by their mass-to-charge ratios.
As used herein, the term “source” refers to a device in a mass spectrometer that ionizes a sample to be analyzed.
As used herein, the term “detector” refers to a device in a mass spectrometer that detects ions.
As used herein, the term “ion” refers to any object containing a charge, which can be formed for example by adding electrons to or removing electrons from the object.
As used herein, the term “mass spectrum” refers to a plot of data produced by a mass spectrometer, typically containing m/z values on x-axis and intensity values on y-axis.
As used herein, the term “m/z” refers to the dimensionless quantity formed by dividing the mass number of an ion by its charge number. It has long been called the “mass-to-charge” ratio.
As used herein, the term “scan” refers to a mass spectrum that is associated with a particular separation index. For example, systems that use a chromatographic separation technique may generate multiple scans, each scan at a different retention time.
As used herein, the term “sample” is used in its broadest sense, and may include a specimen or culture, of natural or synthetic origin.
As used herein, the term “biological sample” refers to plant, fungus, or animal, including human, fluid, solid (e.g., stool) or tissue, as well as cell cultures and culture and fermentation media, liquid and solid food and feed products and ingredients such as dairy items, grains, vegetables, meat and meat by-products, and waste. Biological samples may be obtained from all of the various families of domestic animals, as well as feral or wild animals, including, but not limited to, such animals as ungulates, bear, fish, lagamorphs, rodents, etc. A biological sample may contain any biological material, and may comprise cellular and/or non-cellular material from a subject. The sample can be isolated from any suitable biological tissue or fluid such as, for example, prostate tissue, blood, blood plasma, urine, or cerebral spinal fluid (CSF).
As used herein, the term “environmental sample” refers to environmental material such as surface matter, soil, water and industrial samples, as well as samples obtained from food and dairy processing instruments, apparatus, equipment, utensils, disposable and non-disposable items. These examples are not to be construed as limiting the sample types applicable to the subject matter described herein.
Systems that couple the output of a liquid or gas chromatograph to the input of a mass spectrometer, such that the chromatograph separates the sample into chemical constituents, which are fed into the ion source of the mass spectrometer, exist. Conventional systems analyze the resulting mass spectrum by performing a best fit analysis of the mass spectrum recorded against libraries of mass spectrum data. However, this approach suffers several deficiencies.
First, compound library matching usually does not consider separation data, such as retention time or retention index. As a result, the system typically must attempt to identify a compound observed in the mass spectrum by comparing it to every compound in the library, regardless of the possibility that the library chemical entity would or would not have had the same separation characteristics as the compound being analyzed. In some cases, two different chemical constituents have the same mass, and are thus indistinguishable without chromatography data. The problem is further compounded when the separation technique used does not adequately separate the two chemical constituents having the same mass. In this situation, even if the system did consider separation data, the two constituents would appear together as a single peak rather than two peaks, and are again indistinguishable from each other.
Second, the libraries of mass spectrum data may be synthetic. As used herein, the term “synthetic library” refers to a library that was generated on another system or was generated in silico, i.e., based on hypothetical or calculated results, rather than on empirical results. Because synthetic libraries do not reflect the particular characteristics of the method and instrument that is used to actually perform the analysis, synthetic libraries may introduce error.
Third, conventional systems that have high accuracy, such as high accuracy mass spectrometers, commonly referred to as “accurate mass” systems, are expensive, and many have a lower duty cycle than their standard counterparts. Thus, in conventional systems, there may be a tradeoff between accuracy and throughput. Furthermore, accurate mass alone is insufficient for high confidence identification of a chemical constituent. For example, the amino acids leucine and isoleucine have identical mass, because they have the same combination of atoms, but arranged in slightly different locations on the respective molecule. Accurate mass alone cannot differentiate between them. Accurate mass is neither a prerequisite nor a guarantee of accurate identification of chemical constituents.
Fourth, some conventional systems perform “targeted” analysis, meaning that they are configured to look for and identify specific chemical constituents. Such systems cannot perform “non-targeted” analysis, which attempts to detect and identify all chemical constituents of a sample, including hitherto unknown entities. Non-targeted analysis is an approach that has enormous potential application and benefits. For example, metabolomic analysis, which analyzes the metabolites or by-products of cellular processes, is useful to monitor in a non targeted manner (i.e., globally), changes in metabolic profiles related to age, gender, or other factors (e.g., health or disease status), and can be extended to detect dietary metabolites as well as drugs, medications, and other xenobiotics (chemical substances that are found in an organism but which are not normally produced or expected to be present in the organism) that are present in the sample matrix. The ability to determine the composition of chemical constituents in a complex mixture in a non-targeted manner can be useful in a variety of other contexts. One such context is bioprocessing, which is the growth of cells to produce drugs, enzymes, chemicals, additives, and other useful products. Other contexts include analysis of biological and environmental samples.
Accordingly, there exists a need to provide systems and methods for more accurately determining, in a non-targeted manner, the composition of chemical constituents in a complex mixture.