Living organisms are autonomous chemical systems involving numerous molecular entities and chemical processes, such as water, sugars, amino acids, and the processes which translate one into the other. The complexity of life is constituted with numerous biochemical processes involving controlling information flow through biochemical signaling and the flow of chemical energy through metabolism. For example, sugars can be break down through a series of oxidative reactions to small sugar derivatives, providing chemical energy for cells and other basic biological activities, and ultimately to carbon dioxide and water. The intermediates and products of metabolism are called metabolites. Metabolites have various functions, including fuel, structure, signaling, stimulatory and inhibitory effects on enzymes, catalytic activity of their own, defense, and so on. The concentration levels of various metabolites may be related to or directly contribute to various phenotypes of the living organisms, such as disease status and drug response. For example, high glucose is related to diabetes, and high low-density lipoprotein and triglyceride to various cardiovascular diseases.
Over the past decades, major advances in analytical chemistry have resulted in the emergence of the discipline metabolomics. It includes using analytical devices to simultaneous identify and quantify hundreds to thousands of metabolites present in one or a plurality of biological samples, e.g., plasma, urine, and cerebrospinal fluid (CSF). US patents that developed systems and methods to process signals from the analytical devices to identify and quantify metabolites include: U.S. Pat. No. 7,561,975, entitled System, Method, And Computer Program Product For Analyzing Spectrometry Data to Identify and Quantify Individual Components in a Sample; U.S. Pat. No. 7,949,475, entitled System and Method for Analyzing Metabolomic Data; U.S. Pat. No. 8,175,816, entitled System And Method for Analyzing Metabolomic Data; and U.S. Pat. No. 7,433,787, entitled System, Method, and Computer Program Product Using a Database in a Computing System to Compile and Compare Metabolomic Data Obtained from a Plurality of Samples.
The identities and concentration levels of the metabolites, sometimes called “metabotype”, usually reflect net interactions between genes and environment, providing information that can possibly bridge a gap between genotype and phenotype. Attesting to this belief, metabolomics has been widely used to understand disease pathogenesis and drug effects, as well as to predict variations in drug response, including both efficacy and safety, among many other applications, which were partially described in U.S. Pat. No. 7,947,453, entitled Methods for Drug Discovery, Disease Treatment, and Diagnosis Using Metabolomics. These applications of metabolomics typically involve identifying “metabolomics signatures” among the metabolites: examples of such metabolomics signatures are 1) metabolites that are influenced by a stimulus, e.g., a drug treatment, and 2) metabolites that are associated with a phenotype of interest, e.g., a disease status or a drug response. These metabolomics signatures can help understand pathologies of different kinds of diseases, identify better targets for drug development, among many other applications.
To achieve the aforementioned goals, metabolomics data needs to be analyzed using some statistical or other analytical methods run on computer processors in communication with a database that stores the metabolomics data along with other necessary data. Common metabolomics data analysis practice uses routine statistical tools, such as Student's t-tests and regression techniques, to identify the metabolomics signatures. These methods, as well as many multivariate chemometrics and statistics tools, including those that were reviewed in Korman et al. (Methods Mol Biol, 856: 381-413, 2012) and Lindon et al. (The Handbook of Metabonomics and Metabolomics. Elsevier, Amsterdam and Oxford, 2007), essentially treat metabolites as individual variables instead of biological entities, of which, however, some prior knowledge may have been accumulated and is accessible from literatures and/or databases.
Since metabolomics data is usually noisy and the number of samples in metabolomics studies is often limited due to limited budget or other reasons, metabolomics studies often face the so-called “lack-of-power” issue. That is even there exist true metabolomics signatures, a metabolomics data analysis may fail to identify some of them. Therefore, there is a keen need to develop methods to improve the performance of metabolomics data analysis, so that metabolomics can better help improve human health and/or to better facilitate other researches.