Field of the Disclosure
The present disclosure relates to the field of analyzing small molecule components in a complex mixture and, more particularly, to a method and associated apparatus and computer program product for analyzing small molecule components of a complex mixture in a multi-sample process, with such small molecule analysis including metabolomics, which is the study of small molecules produced by an organism's metabolic processes, or other analysis of small molecules produced through metabolism.
Description of Related Art
Metabolomics is the study of the small molecules, or metabolites, contained in a cell, tissue or organ (including fluids) and involved in primary and intermediary metabolism. The term “metabolome” refers to the collection of metabolites present in an organism. The human metabolome encompasses native small molecules (natively biosynthesizeable, non-polymeric compounds) that are participants in general metabolic reactions and that are required for the maintenance, growth and normal function of a cell. Thus, metabolomics is a direct observation of the status of cellular physiology, and may thus be predictive of disease in a given organism. Subtle biochemical changes (including the presence of selected metabolites) are inherent in a given disease. Therefore, the accurate mapping of these changes to known pathways may allow researchers to build a biochemical hypothesis for a disease. Based on this hypothesis, the enzymes and proteins critical to the disease can be uncovered such that disease targets may be identified for treatment with targeted pharmaceutical compounds or other therapy.
Molecular biology techniques for uncovering the biochemical processes underlying disease have been centered on the genome, which consists of the genes that make up DNA, which is transcribed into RNA and then translated to proteins, which then make up the small molecules of the human metabolome. While genomics (study of the DNA-level biochemistry), transcript profiling (study of the RNA-level biochemistry), and proteomics (study of the protein-level biochemistry) are useful for identification of disease pathways, these methods are complicated by the fact that there exist over 25,000 genes, 100,000 to 200,000 RNA transcripts and up to 1,000,000 proteins in human cells. However, it is estimated that there may be as few as 2,500 small molecules in the human metabolome.
Thus, metabolomic technology provides a significant leap beyond genomics, transcript profiling, and/or proteomics. With metabolomics, metabolites and their role in metabolism may be readily identified. In this context, the identification of disease targets may be expedited with greater accuracy relative to other known methods. The collection of metabolomic data for use in identifying disease pathways is generally known in the art, as described generally, for example, in U.S. Pat. Nos. 7,005,255 and 7,329,489 to Metabolon, Inc., each entitled Methods for Drug Discovery, Disease Treatment, and Diagnosis Using Metabolomics. Additional uses for metabolomics data are described therein and include, for example, determining response to a therapeutic agent (i.e., a drug) or other xenobiotics, monitoring drug response, determining drug safety, and drug discovery. However, the collection and sorting of metabolomic data taken from a variety of samples (e.g., from a patient population) consumes large amounts of time and computational power. For example, according to some known metabolomic techniques, spectrometry data for certain samples is collected and plotted in three (or more) dimensions (i.e., sample properties that can be represented along an axis with respect to other sample properties) and stored in an individual file corresponding to each sample. This data is then, by individual file, compared to data corresponding to a plurality of known metabolites in order to identify known metabolites that may be disease targets. The data may also be used for identification of toxic agents and/or drug metabolites. Furthermore such data may also be used to monitor the effects of xenobiotics and/or used to monitor/measure/identify the xenobiotics and associated metabolites produced by processing (metabolizing) the xenobiotics. However, such conventional “file-based” methods (referring to the individual data file generated for each sample) require the use of large amounts of computing power and memory capacity to handle the screening of large numbers of known metabolites. Furthermore, “file-based” data handling may not lend itself to the compilation of sample population data across a number of samples because, according to known metabolomic data handling techniques, each sample is analyzed independently, without taking into account subtle changes in metabolite composition that may be more readily detectable across a sample population. Furthermore, existing “file-based” method may have other limitations including: limited security and auditability; and poor data set consistency across multiple file copies. In addition, individual files may not support multiple indices (i.e., day collected, sample ID, control vs. treated, drug dose, etc.) such that all files must be scanned when only a particular subset is desired.
These limitations in current metabolomic data analysis techniques may lead to the discarding of potentially relevant and/or valuable metabolomic data that may be used to identify and classify particular metabolites as disease targets. Specifically, spectrometry data corresponding to a number of samples (such as tissue samples from individual human subjects) generally results in a large data file corresponding to each sample, wherein each data file must then be subjected to an individual screening process with respect to a library of known metabolites. However, conventional systems do not readily allow for the consolidation of spectrometry data from a number of samples for the subjective evaluation of the data generated by the spectrometry processes. Thus, while a single file corresponding to an individual sample may be inconclusive, such data may be more telling if viewed subjectively in a succinct format with respect to other samples within a sample population.
One particular example of a limitation in current metabolomic data analysis techniques involves the identification and quantification of a metabolite in each of a plurality of sample. In some instances, the identification of the metabolite involves analyzing the data file of each sample to determine whether an indication (i.e., an intensity peak for a particular sample ion mass or sample component mass, observed at a particular retention time or range or retention times) of that metabolite exists within the respective data files. If such an indication is determined, quantification of that metabolite may then involve the integration (mathematical calculation of area) of the area represented by that indication (i.e., the area under the intensity peak). However, as previously noted, it may be difficult in “file based” data handling methods to verify whether the determined indication is consistent across samples. For example, it may be difficult to determine whether the identified intensity peaks are aligned with respect to retention time across the samples. Further, there may be instances where the indication (i.e., the intensity peak) is not clearly defined within the data file of one or more samples. In those instances, the integration procedure used to calculate the area represented by the indication may vary, for instance, based on the assumptions used or estimates performed in connection with the calculation, particularly where the origin and the terminus of a particular intensity peak is not clearly evident. There may also be instances where the indication (i.e., the intensity peak) may actually reflect the presence of more than one sample component and, as such, any analysis of those intensity peaks as a whole may be significantly inaccurate. As such, the various assumptions and estimates, which may be difficult to analyze for individual samples when using a file-base data handling method, may result in an inaccurate indication of the quantity of that metabolite (or a plurality of metabolites) present over the plurality of the sample. In this regard, such a quantitative inaccuracy introduced into a metabolomics analysis at such an early stage may lead to larger inaccuracies in subsequent steps or analyses.
Therefore, there exists a need for an improved apparatus and method for solving the technical issues outlined above that are associated with conventional metabolomic data analysis systems. More particularly, there exists a need for an apparatus and method capable of analyzing spectrometry data across samples, with the option of, but not the need for, generating a separate data file for each sample. There also exists a need for an apparatus and method capable of allowing a user to subjectively evaluate spectrometry data across a plurality of samples to identify selected metabolites, for allowing the user to verify or otherwise determine the confidence in the identification of the selected metabolites, for allowing the user to examine the data associated with the identification of the selected metabolites, for example, for sorting, grouping, and/or aligning purposes, and for allowing the user to determine additional information related to the identified selected metabolites, for instance, for quality control and consistency verification purposes. There also exists a need for an improved apparatus and method capable of more accurately identifying and quantifying sample components across samples from the acquired spectrometry data.