1. Field of the Invention
The present invention relates to the use of non-negative factorization functions and/or correlation functions to determine a characteristic value corresponding to one or more components (such as, for example, metabolites) or other compounds present in a plurality of samples and to use the characteristic value to identify and/or quantify individual components or other components that may be present in the samples.
2. Description of Related Art
The detection of subtle chemical cues in a sample to reveal the presence and corresponding relative quantity of selected components (such as certain small molecules, therapeutic agents, xenobiotics, metabolites, and other substances) has long been a goal of researchers and clinicians. For example, in the field of metabolomics, the small molecules, or metabolites, contained in a human cell, tissue or organ (including fluids) and involved in primary and intermediary metabolism are scrutinized in an attempt to determine the presence and/or identity of such small molecules. The term “metabolome” refers to the collection of metabolites present in an organism. The human metabolome encompasses native small molecules (natively biosynthesizeable, non-polymeric compounds) that are participants in general metabolic reactions and that are required for the maintenance, growth and normal function of a cell. Thus, metabolomics is a direct observation of the status of cellular physiology, and may thus be predictive of disease in a given organism. Subtle biochemical changes (including the presence of selected metabolites) are inherent in a given disease. Therefore, the accurate mapping of these changes to known pathways may allow researchers to build a biochemical hypothesis for a disease. Based on this hypothesis, the enzymes and proteins critical to the disease can be uncovered such that disease targets may be identified for treatment with targeted pharmaceutical compounds.
Molecular biology techniques for uncovering the biochemical processes underlying disease in humans have been centered on the human genome, which consists of the genes that make up human DNA, which is transcribed into RNA and then translated to proteins, which then make up the small molecules of the human metabolome. While genomics (study of the DNA-level biochemistry), transcript profiling (study of the RNA-level biochemistry), and proteomics (study of the protein-level biochemistry) are useful for identification of disease pathways, these methods are complicated by the fact that there exist over 25,000 genes, 100,000 to 200,000 RNA transcripts and up to 1,000,000 proteins in human cells. However, it is estimated that there may be as few as 2,500 small molecules in the human metabolome.
Thus, metabolomic technology provides a significant leap beyond genomics, transcript profiling, and/or proteomics. With metabolomics, metabolites, and their role in the human metabolism may be readily identified. In this context, the identification of disease targets may be expedited with greater accuracy than with any other known methods. The collection of metabolomic data for use in identifying disease pathways is generally known in the art, as described generally in U.S. Pat. No. 7,005,255, entitled Methods for Drug Discovery, Disease Treatment, and Diagnosis Using Metabolomics. However, the collection and sorting of metabolomic data taken from a variety of biological samples (i.e., from a patient population) consumes large amounts of time and computational power. For example, according to some metabolomic techniques, spectrometry data for biological samples is collected and plotted in three dimensions and stored in an individual file corresponding to each biological sample. Such spectrometry data consists of known spectra corresponding to the detection of certain ions that may be present in a given sample. While individual ions may be detectable in such spectra, the combinations and interplay of such ions to indicate specific individual metabolite compounds may not be immediately discernable, especially in only a single biological sample.
If the sample subjected to spectrometry contains substantially pure components (such small molecule metabolites, for example), the spectrum of the component can be easily matched with the spectra of known components in order to identify the component. Furthermore, if there is an ion unique to a specific component, then the intensity (as discernible in the spectral plot) of the ion can be used for the relative quantification of the component in the sample. However, in many cases, the fractionation of a particular biological sample (in a liquid or gas chromatograph, for example) is incomplete. For example, two or more component compounds or small molecule components may “co-elute” from the physical separation process giving rise to an impure mixture of components going into the spectrometer. Thus, subtle spectral trends viewed over many individual biological samples of the same type may be indicative of the presence of one or more otherwise-obscured components.
The assignee of the present application, Metabolon, Inc., has developed a system and method for manipulating three-dimensional spectrometry data sets to produce plots that are more directly comparable to a plurality of characteristic plots corresponding to a plurality of selected metabolites, as disclosed in U.S. patent application Ser. No. 11/462,838 entitled A System, Method, and Computer Program Product Using an Automated Relational Database in a Computing System to Compile and Compare Metabolomic Data Obtained from a Plurality of Samples, which is incorporated herein by reference in its entirety. Such characteristic plots may enable a user to subjectively analyze a series of complex data sets in a visual display that may indicate the presence of selected sample components across the group of samples even in cases where the selected components have co-eluted from the physical separation processes prior to spectral analysis. While subjectively comparing deconstructed spectral plots to spectral characteristic plots may be useful for identifying the potential presence of more complex mixtures of components in a given type of biological sample, such subjective comparisons still do not provide quantitative information related to the relative amounts of particular components (such as metabolites, small molecule therapeutic agents, metabolized drugs, and xenobiotics, for example) that may be present in a particular sample.
Furthermore, some analytical methods have been proposed for quantitatively analyzing spectrometry data sets across a group of samples. For example, factor analysis (FA), principal component analyses (PCA), and singular value decomposition (SVD) have been applied to a matrix of spectrometry data from a group of biological samples to generate a small number of basic spectral profiles (corresponding to individual component compounds in the samples), and to calculate the weights with which each of these basic components is present in each individual sample. However, FA, PCA, and SVD analytic methods provide results that are often ambiguous and/or difficult to interpret because the basic spectral profiles may include a number of negative values (having no meaningful analytical value). Thus, post-analysis transformations, requiring additional computing power, time, and skill, are required to glean physically meaningful analytical results from the process. In addition, FA, PCA, and SVD analytical methods do not necessarily yield results that point to independent groups of ions indicative of particular metabolite compounds or other components present in the samples, as described for example by Juvela et al. See Juvela, M., Lehtinen, K. and Paatero, P., “The Use of Positive Matrix Factorization in the Analysis of Molecular Line Spectra from the Thumbprint Nebula (1994),” Clouds Cores and Low Mass Starts ASP Conference Series, Vol. 65, pp. 176-180; D. P. Clemens and R. Barvainis, eds.
Therefore, there exists a need for an improved system to solve the technical problems outlined above that are associated with existing metabolomic data analysis systems. More particularly, there exists a need for a system and method capable of analyzing spectrometry data across a group of biological samples to easily and accurately determine: physically-relevant non-negative amounts of each metabolite compound present in the samples, regardless of the co-elution of some metabolite compounds in a particular sample; spectra of the metabolite compounds present in the samples; and a number of metabolite compounds that may be present in the samples. There is also a need for a system and method for de-convoluting mass spectrometry data from a plurality of samples, and/or parent compounds included therein, into the spectra of the pure metabolite compounds present in the samples and determining the relative concentration of the metabolite compounds in the samples.