This invention pertains generally to the field of chemometrics, metabonomics, and, more particularly, to methods for the analysis of chemical, biochemical, and biological data, for example, spectra, for example, nuclear magnetic resonance (NMR) and other types of spectra.
Significant progress has been made in developing methods to determine and quantify the biochemical processes occurring in living systems. Such methods are valuable in the diagnosis, prognosis and treatment of disease, the development of drugs, as well as for improving therapeutic regimes for current drugs.
Diseases of the human or animal body (such as cancers, degenerative diseases, autoimmune diseases and the like) have an underlying basis in alterations in the expression of certain genes. The expressed gene products, proteins, mediate effects such as abnormal cell growth, cell death or inflammation. Some of these effects are caused directly by proteinxe2x80x94protein interactions; other are caused by proteins acting on small molecules (e.g. xe2x80x9csecond messengersxe2x80x9d) which trigger effects including further gene expression.
Likewise, disease states caused by external agents such as viruses and bacteria provoke a multitude of complex responses in infected host.
In a similar manner, the treatment of disease through the administration of drugs can result in a wide range of desired effects and unwanted side effects in a patient.
At the genetic level, methods for examining gene expression in response to these types of events are often referred to as xe2x80x9cgenomic methods,xe2x80x9d and are concerned with the detection and quantification of the expression of an organism""s genes, collectively referred to as its xe2x80x9cgenome,xe2x80x9d usually by detecting and/or quantifying genetic molecules, such as DNA and RNA. Genomic studies often exploit a new generation of proprietary xe2x80x9cgene chips,xe2x80x9d which are small disposable devices encoded with an array of genes that respond to extracted mRNAs produced by cells (see, for example, Klenk et al., 1997). Many genes can be placed on a chip array and patterns of gene expression, or changes therein, can be monitored rapidly, although at some considerable cost.
However, the biological consequences of gene expression, or altered gene expression following perturbation, are extremely complex. This has led to the development of xe2x80x9cproteomic methodsxe2x80x9d which are concerned with the semi-quantitative measurement of the production of cellular proteins of an organism, collectively referred to as its xe2x80x9cproteomexe2x80x9d (see, for example, Geisow, 1998). Proteomic measurements utilise a variety of technologies, but all involve a protein separation method, e.g., 2D gel-electrophoresis, allied to a chemical characterisation method, usually, some form of mass spectrometry.
In recent years, it has been appreciated that the reaction of human and animal subjects to disease and treatments for them can vary according to the genomic makeup of an individual. This has led to the development of the field of xe2x80x9cpharmacogenomics.xe2x80x9d A fuller understanding of how an individual""s own genome reacts to a particular disease will allow the development of new therapies, as well as the refinement of existing ones.
At present, genomic and proteomic methods, which are both expensive and labour intensive, have the potential to be powerful tools for studying biological response. The choice of method is still uncertain since careful studies have sometimes shown a low correlation between the pattern of gene expression and the pattern of protein expression, probably due to sampling for the two technologies at inappropriate time points (see, e.g., Gygi et al., 1999). Even in combination, genomic and proteomic methods still do not provide the range of information needed for understanding integrated cellular function in a living system, since they do not take account of the dynamic metabolic status of the whole organism.
For example, genomic and proteomic studies may implicate a particular gene or protein in a disease or a xenobiotic response because the level of expression is altered, but the change in gene or protein level may be transitory or may be counteracted downstream and as a result there may be no effect at the cellular and/or biochemical level. Conversely, sampling tissue for genomic and proteomic studies at inappropriate time points may result in a relevant gene or protein being overlooked,
Nonetheless, recent advances in genomics and proteomics now permit the rapid identification of new potential targets for drug development. With a new target in hand, and with the aid of combinatorial chemistry and high throughput screening, the pharmaceutical industry is capable of rapidly generating and screening thousands of new candidate compounds each week.
However, in practice, only a few of these candidate compounds will be taken further, for example, into pre-clinical and clinical development. It is therefore critical to identify those candidate compounds with the most promise, and this is usually judged by efficacy and toxicology, before selection for clinical studies. However, these selection processes are imperfect and many drugs fail in clinical trials due to lack of efficacy and/or toxicological effects. It is also possible that other drugs may fait overall because they are only effective in a subgroup of patients who have an unrecognised pharmacogenomic response. There is a great need to find new ways of reducing this compound xe2x80x9cattritionxe2x80x9d or losses of drugs late in the development process, for example, through the development and application of analytical technologies designed to maximise efficiency of compound selection and to minimise attrition rates.
While genomic and proteomic methods may be useful aids in compound selection, they do suffer from substantial limitations. For example, while genomic and proteomic methods may ultimately give profound insights, into toxicological mechanisms and provide new surrogate biomarkers of disease, at present it is very difficult to relate genomic and proteomic findings to classical cellular or biochemical indices or endpoints. One simple reason for this is that with current technology and approach, the correlation of the time-response to drug exposure is difficult. Further difficulties arise with in vitro cell-based studies. These difficulties are particularly important for the many known cases where the metabolism of the compound is a prerequisite for a toxic effect and especially true where the target organ is not the site of primary metabolism. This is particularly true for pro-drugs, where some aspect of in situ chemical (e.g., enzymatic) modification is required for activity.
A new xe2x80x9cmetabonomicxe2x80x9d approach has been proposed which is aimed at augmenting and complementing the information provided by genomics and proteomics. xe2x80x9cMetabonomicsxe2x80x9d is conventionally defined as xe2x80x9cthe quantitative measurement of the multiparametric metabolic response of living systems to pathophysiological stimuli or genetic modificationxe2x80x9d (see, for example, Nicholson et al., 1999). This concept has arisen primarily from the application of 1H NMR spectroscopy to study the metabolic composition of biofluids, cells, and tissues and from studies utilising pattern recognition (PR), expert systems and other chemoinformatic tools to interpret and classify complex NMR-generated metabolic data sets. Metabonomic methods have the potential, ultimately, to determine the entire dynamic metabolic make-up of an organism.
A pathological condition or a xenobiotic may act at the pharmacological level only and hence may not affect gene regulation or expression directly. Alternatively significant disease or toxicological effects may be completely unrelated to gene switching. For example, exposure to ethanol in vivo may switch on many genes but none of these gene expression events explains drunkenness. In cases such as these, genomic and proteomic methods are likely to be ineffective. However, all disease or drug-induced pathophysiological perturbations result in disturbances in the ratios and concentrations, binding or fluxes of endogenous biochemicals, either by direct chemical reaction or by binding to key enzymes or nucleic acids that control metabolism. If these disturbances are of sufficient magnitude, effects will result which will affect the efficient functioning of the whole organism. In body fluids, metabolites are in dynamic equilibrium with those inside cells and tissues and, consequently, abnormal cellular processes in tissues of the whole organism following a toxic insult or as a consequence of disease will be reflected in altered biofluid compositions.
Fluids secreted, excreted, or otherwise derived from an organism (xe2x80x9cbiofluidsxe2x80x9d) provide a unique window into its biochemical status since the composition of a given biofluid is a consequence of the function of the cells that are intimately concerned with the fluid""s manufacture and secretion. For example, the composition of a particular fluid can carry biochemical information on details of organ function (or dysfunction), for example, as a result of xenobiotics, disease, and/or genetic modification. Similarly, the composition and condition of an organism""s tissues are also indicators of the organism""s biochemical status. Examples of biofluids include, for example, urine, blood plasma, milk, etc.
Biofluids often exhibit very subtle changes in metabolite profile in response to external stimuli. This is because the body""s cellular systems attempt to maintain homeostasis (constancy of internal environment), for example, in the face of cytotoxic challenge. One means of achieving this is to modulate the composition of biofluids. Hence, even when cellular homeostasis is maintained, subtle responses to disease or toxicity are expressed in altered biofluid composition. However, dietary diurnal and hormonal variations may also influence biofluid compositions, and it is clearly important to differentiate these effects if correct biochemical inferences are to be drawn from their analysis.
One of the most successful approaches to biofluid analysis has been the use of NMR spectroscopy (see, for example, Nicholson et al., 1989); similarly, intact tissues have been successfully analysed using magic-angle-spinning 1H NMR spectroscopy (see, for example, Moka et al., 1998; Tomlins et al., 1998).
The NMR spectrum of a biofluid provides a metabolic fingerprint or profile of the organism from which the biofluid was obtained, and this metabolic fingerprint or profile is characteristically changed by a disease, toxic process, or genetic modification. For example, NMR spectra may be collected for various states of an organism, e.g., pre-dose and various times post-dose, for one or more xenobiotics, separately or in combination; healthy (control) and diseased animal; unmodified (control) and genetically modified animal.
For example, in the evaluation of undesired toxic side-effects of drugs, each compound or class of compound produces characteristic changes in the concentrations and patterns of endogenous metabolites in biofluids that provide information on the sites and basic mechanisms of the toxic process. 1H NMR analysis of biofluids has successfully uncovered novel metabolic markers of organ-specific toxicity in the laboratory rat, and it is in this xe2x80x9cexploratoryxe2x80x9d role that NMR as an analytical biochemistry technique excels. However, the biomarker information in NMR spectra of biofluids is very subtle, as hundreds of compounds representing many pathways can often be measured simultaneously, and it is this overall metabonomic response to toxic insult that so well characterises the lesion.
All biological fluids and tissues have their own characteristic physico-chemical properties, and these affect the types of NMR experiment that may be usefully employed. One major advantage of using NMR spectroscopy to study complex biomixtures is that measurements can often be made with minimal sample preparation (usually with only the addition of 5-10% D2O) and a detailed analytical profile can be obtained on the whole biological sample. Sample volumes are small, typically 0.3 to 0.5 mL for standard probes, and as low as 3 xcexcL for microprobes. Acquisition of simple NMR spectra is rapid and efficient using flow-injection technology. It is usually necessary to suppress the water NMR resonance.
Many biofluids are not chemically stable and for this reason care should be taken in their collection and storage. For example, cell lysis in erythrocytes can easily occur. If a substantial amount of D2O has been added, then it is possible that certain 1H NMR resonances will be lost by H/D exchange. Freeze-drying of biofluid samples also causes the loss of volatile components such as acetone. Biofluids are also very prone to microbiological contamination, especially fluids, such as urine, which are difficult to collect under sterile conditions. Many biofluids contain significant amounts of active enzymes, either normally or due to a disease state or organ damage, and these may enzymes may alter the composition of the biofluid following sampling. Samples should be stored deep frozen to minimise the effects of such contamination. Sodium azide is usually added to urine at the collection point to act as an antimicrobial agent. Metal ions and or chelating agents (e.g., EDTA) may be added to bind to endogenous metal ions (e.g., Ca2+, Mg2+ and Zn2+) and chelating agents (e.g., free amino acids, especially glutamate, cysteine, histidine and aspartate; citrate) to alter and/or enhance the NMR spectrum.
In all bases the analytical problem usually involves the detection of xe2x80x9ctracexe2x80x9d amounts of analytes in a very complex matrix of potential interferences. It is, therefore, critical to choose a suitable analytical technique for the particular class of analyte of interest in the particular biomatrix which could be a biofluid or a tissue. High resolution NMR spectroscopy (in particular 1H NMR) appears to be particularly appropriate. The main advantages of using 1H NMR spectroscopy in this area are the speed of the method (with spectra being obtained in 5 to 10 minutes), the requirement for minimal sample preparation, and the fact that it provides a non-selective detector for all the abnormal metabolites in the biofluid regardless of their structural type, providing only that they are present above the detection limit of the NMR experiment and that they contain non-exchangeable hydrogen atoms. The speed advantage is of crucial importance in this area of work as the clinical condition of a patient may require rapid diagnosis, and can change very rapidly and so correspondingly rapid changes must be made to the therapy provided.
NMR studies of body fluids should ideally be performed at the highest magnetic field available to obtain maximal dispersion and sensitivity and most 1H NMR studies have been performed at 400 MHz or greater. With every new increase in available spectrometer frequency the number of resonances that can be resolved in a biofluid increases and although this has the effect of solving some assignment problems, it also poses new ones. Furthermore, there are still important problems of spectral interpretation that arise due to compartmentation and binding of small molecules in the organised macromolecular domains that exist in some biofluids such as blood plasma and bile. All this complexity need not reduce the diagnostic capabilities and potential of the technique, but demonstrates the problems of biological variation and the influence of variation on diagnostic certainty.
The information content of biofluid spectra is very high and the complete assignment of the 1H NMR spectrum of most biofluids is usually not possible (even using 900 MHz NMR spectroscopy, the highest frequency commercially available). However, the assignment problems vary considerably between biofluid types. Some fluids have near constant composition and concentrations and in these the majority of the NMR signals have been assigned. In contrast, urine composition can be very variable and there is enormous variation in the concentration range of NMR-detectable metabolites; consequently, complete analysis is much more difficult. Those metabolites present close to the limits of detection for 1-dimensional (1D) NMR spectroscopy (ca. 100 nM for many metabolites at 800 MHz) pose severe NMR spectral assignment problems. (In absolute terms, the detection limit may be ca. 4 nmol, e.g., 1 xcexcg of a 250 g/mol compound in a 0.5 mL sample volume.) Even at the present level of technology in NMR, it is not yet possible to detect many important biochemical substances, e.g. hormones, proteins or nucleic acids in body fluids because of problems with sensitivity, line widths, dispersion and dynamic range and this area of research will continue to be technology-limited. In addition, the collection of NMR spectra of biofluids may be complicated by the relative water intensity, sample viscosity, protein content, lipid content, low molecular weight peak overlap.
Usually in order to assign 1H NMR spectra, comparison is made with spectra of authentic materials and/or by standard addition of an authentic reference standard to the sample. Additional confirmation of assignments is usually sought from the application of other NMR methods, including, for example, 2-dimensional (2D) NMR methods, particularly COSY (correlation spectroscopy), TOCSY (total correlation spectroscopy), inverse-detected heteronuclear correlation methods such as HMBC (heteronuclear multiple bond correlation), HSQC (heteronuclear single quantum coherence), and HMQC (heteronuclear multiple quantum coherence), 2D J-resolved (JRES) methods, spin-echo methods, relaxation editing, diffusion editing (including both 1D NMR and 2D NMR such as diffusion-edited TOCSY), and multiple quantum filtering. Detailed 1H NMR spectroscopic data for a wide range of metabolites and biomolecules found in biofluids have been published (see, for example, Lindon et al., 1999) and supplementary information is available in several literature compilations of data (see, for example, Fan, 1996; Sze et al., 1994).
For example, the successful application of 1H NMR spectroscopy of biofluids to study a variety of metabolic diseases and toxic processes has now been well established and many novel metabolic markers of organ-specific toxicity have been discovered (see, for example, Nicholson et al., 1989; Lindon et al., 1999). For example, NMR spectra of urine is identifiably altered in situations where damage has occurred to the kidney or liver. It has been shown that specific and identifiable changes can be observed which distinguish the organ that is the site of a toxic lesion. Also it is possible to focus in on particular parts of an organ such as the cortex of the kidney and even in favourable cases to very localised parts of the cortex. Finally it is possible to deduce the biochemical mechanism of the xenobiotic toxicity, based on a biochemical interpretation of the changes in the urine. A wide range of toxins has now been investigated including mostly kidney toxins and liver toxins, but also testicular toxins, mitochondrial toxins and muscle toxins.
However, a limiting factor in understanding the biochemical information from both 1D and 2D-dimensional NMR spectra of tissues and biofluids is their complexity. The most efficient way to investigate these complex multiparametric data is employ the 1 D and 2D NMR metabonomic approach in combination with computer-based xe2x80x9cpattern recognitionxe2x80x9d (PR) methods and expert systems. These statistical tools are similar to those currently being explored by workers in the fields of genomics and proteomics.
Pattern recognition (PR) is a general term applied to methods of data analysis which can be used to generate scientific hypotheses as well as testing hypotheses by reducing mathematically the many parameters.
PR methods may be conveniently classified as xe2x80x9csupervisedxe2x80x9d or xe2x80x9cunsupervised.xe2x80x9d Unsupervised methods are used to analyse data without reference to any other independent knowledge, for example, without regard to the identity or nature of a xenobiotic or its mode of action.
Examples of unsupervised pattern recognition methods include principal component analysis (PCA), hierarchical cluster analysis (HCA), and non-linear mapping (NLM).
One of the most useful and easily applied unsupervised PR techniques is principal components analysis (PCA) (see, for example, Sharaf, 1986). Principal components (PCs) are new variables created from linear combinations of the starting variables with appropriate weighting coefficients. The properties of these PCs are such that: (i) each PC is orthogonal to (uncorrelated with) all other PCs, and (ii) the first PC contains the largest part of the variance of the data set (information content) with subsequent PCs containing correspondingly smaller amounts of variance,
A data matrix, X, made up of rows where each row defines a sample, and columns, where each column defines a particular spectral descriptor, can be regarded as composed of a scores matrix, T, and a loadings matrix, L, such that X=TLt, where t denotes the transpose. The covariance matrix, C, is calculated from the data matrix, X. The eigenvalues and eigenvectors of the covariance matrix are determined by diagonalisation. The coordinates in eigenvector plots (the principal components, PCs) are denoted xe2x80x9cscoresxe2x80x9d and comprise the scores matrix T. The eigenvector coefficients are denoted xe2x80x9cloadingsxe2x80x9d and comprise the loadings matrix L, and give the contributions of the descriptors to the PCs.
Thus a plot of the first two or three PC scores gives the xe2x80x9cbestxe2x80x9d representation, in terms of information content, of the data set in two or three dimensions, respectively. A plot of the first two principal component scores, PC1 and PC2, is often called a xe2x80x9cscores plotxe2x80x9d, and provides the maximum information content of the data in two dimensions. Such PC maps can be used to visualise inherent clustering behaviour for drugs and toxins acting on each organ according to toxic mechanism, Of course, the clustering information might be in lower PCs and these have also to be examined.
In this simple metabonomic approach, a sample from an animal treated with a compound of unknown toxicity is compared with a database of NMR-generated metabolic data from control and toxin-treated animals. By observing its position on the PR map relative to samples of known effect, the unknown toxin can often be classified. However, toxicological data are often more complex, with time-related development of lesions and associated shifts in NMR-detected biochemistry. Also, it is more rigorous to compare effects of xenobiotics in the original n-dimensional NMR metabonomic space.
Hierarchical Cluster Analysis, another unsupervised pattern recognition method, permits the grouping of data points which are similar by virtue of being xe2x80x9cnearxe2x80x9d to one another in some multi-dimensional space whose coordinates are defined by the NMR descriptors which may be, for example, the signal intensities for particular assigned peaks in an NMR spectrum. A xe2x80x9csimilarity matrix,xe2x80x9d S, is constructed with elements sij=1xe2x88x92rij/rijmax, where rij is the interpoint distance between points i and j (e.g., Euclidean interpoint distance), and rijmax is the largest interpoint distance for all points. The most distant pair of points will have sij equal to 0, since rij then equals rijmax. Conversely, the closest pair of points will have the largest sij, approaching 1.
The similarity matrix is scanned for the closest pair of points. The pair of points are reported with their separation distance, and then the two points are deleted and replaced with a single combined point. The process is then repeated iteratively until only one point remains. A number of different methods may be used to determine how two clusters will be joined, including the nearest neighbour method (also known as the single link method), the furthest neighbour method, the centroid method (including centroid link, incremental link, median link, group average link, and flexible link variations).
The reported connectivities are then plotted as a dendrogram (a tree-like chart which allows visualisation of clustering), showing samplexe2x80x94sample connectivities versus increasing separation distance (or equivalently, versus decreasing similarity). The dendrogram has the property in which the branch lengths are proportional to the distances between the various clusters and hence the length of the branches linking one sample to the next is a measure of their similarity. In this way, similar data points may be identified algorithmically.
Non-linear mapping (NLM) involves calculation of the distances between all of the points in the original multi-dimensional space. This is followed by construction of a map of points in 2 or 3 dimensions where the sample points are placed in random positions or at values determined by a prior principal components analysis. The least squares criterion is used to move the sample points in the lower dimension map to fit the inter-point distances in the lower dimension space to those in the higher dimensional space. Non-linear mapping is therefore an approximation to the true inter-point distances, but points close in the original multi-dimensional space should also be close in 2 or 3 dimensional space (see, for example, Brown et al., 1996; Farrant et al., 1992).
Alternatively, and in order to develop automatic classification methods, it has proved efficient to use a xe2x80x9csupervisedxe2x80x9d approach to NMR data analysis. Here, a xe2x80x9ctraining setxe2x80x9d of NMR metabonomic data is used to construct a statistical model that predicts correctly the xe2x80x9cclassxe2x80x9d of each sample. This training set is then tested with independent data (xe2x80x9ctest setxe2x80x9d) to determine the robustness of the computer-based model. These models are sometimes termed xe2x80x9cExpert Systems,xe2x80x9d but may be based on a range of different mathematical procedures. Supervised methods can use a data set with reduced dimensionality (for example, the first few principal components), but typically use unreduced data, with full dimensionality. In all cases the methods allow the quantitative description of the multivariate boundaries that characterise and separate each class, for example, each class of xenobiotic in terms of its metabolic effects. It is also possible to obtain confidence limits on any predictions, for example, a level of probability to be placed on the goodness of fit (see, for example, Sharaf, 1986). The robustness of the predictive models can also be checked using cross-validation, by leaving out selected samples from the analysis.
Expert systems may operate to generate a variety of useful outputs, for example, (i) classification of the sample as xe2x80x9cnormalxe2x80x9d or xe2x80x9cabnormalxe2x80x9d (this is a useful tool in the control of spectrometer automation using sequential flow injection NMR spectroscopy); (ii) classification of the target organ for toxicity and site of action within the tissue where in certain cases, mechanism of toxic action may also be classified; and, (iii) identification of the biomarkers of a pathological disease condition or toxic effect for the particular compound under study. For example, a sample can be classified as belonging to a single class of toxicity, to multiple classes of toxicity (more than one target organ), or to no class. The latter case would indicate deviation from normality (control) based on the training set model but having a dissimilar metabolic effect to any toxicity class modelled in the training set (unknown toxicity type). Under (ii), a system could also be generated to support decisions in clinical medicine (e.g., for efficacy of drugs) rather than toxicity.
Examples of supervised pattern recognition methods include the following, which are briefly described below: soft independent modelling of class analysis (SIMCA) (see, for example, Wold, 1976); partial least squares analysis (PLS) (see, for example, Wold, 1966; Joreskog, 1982; Frank, 1984); linear descriminant analysis (LDA) (see, for example, Nillson, 1965); K-nearest neighbour analysis (KNN) (see, for example, Brown et al., 1996); artificial neural networks (ANN) (see, for example, Wasserman, 1989; Anker et al., 1992; Hare, 1994); probabilistic neural networks (PNNs) (see, for example, Parzen, 1962; Bishop, 1995; Speckt, 1990; Broomhead et al., 1988; Patterson, 1996); rule induction (RI) (see, for example, Quinlan, 1986); and, Bayesian methods (see, for example, Bretthorst, 1990).
As the size of metabonomic databases increases together with improvements in rapid throughput of NMR samples ( greater than 300 samples per day per spectrometer is now possible with the first generation of flow injection systems), more subtle expert systems may be necessary, for examples using techniques such as xe2x80x9cfuzzy logicxe2x80x9d which permit greater flexibility in decision boundaries.
Pattern recognition methods have been applied to the analysis of metabonomic data, including, for example, complex NMR data, with some success (see, for example, Anthony et al., 1994; Anthony et al., 1995; Beckwith-Hall et al., 1998; Gartland et al., 1990a; Gartland et al., 1990b; Gartland et al., 1991; Holmes et al., 1998a; Holmes et al., 1998b; Holmes et al., 1992; Holmes et al., 1994; Spraul et al., 1994; Tranter et al., 1999).
Although the utility of the metabonomic approach is well established, there remains a great need for improved methods of analysis. The metabolic variation is often subtle, and powerful analysis methods are required for detection of particular analytes, especially when the data (e.g., NMR spectra) are so complex.
One aim of the present invention is to provide data analysis methods for the detection of such metabolic variations, as part of a metabonomic approach.
One aspect of the present invention pertains to improved methods for the analysis of chemical, biochemical, and biological data, for example spectra, for example, nuclear magnetic resonance (NMR) and other types of spectra.
One aspect of the invention pertains to a method for processing a sample spectrum comprising:
replacing each of one or more target regions in said sample spectrum with a corresponding replacement region of a master control spectrum to give a target-replaced sample spectrum,
wherein said replacement region has been scaled so as to have the same fraction of the total integrated intensity in said target-replaced sample spectrum as it did in said master control spectrum.
One embodiment of the present invention pertains to a method for processing a sample spectrum for a test sample, said method comprising the steps of.
(a) identifying, in said sample spectrum, one or more target regions for replacement;
(b) providing a master control spectrum which comprises one replacement region corresponding to each of said target regions; and,
(c) replacing each of said target regions with the corresponding replacement region to give a target-replaced sample spectrum,
wherein said replacement region has been scaled so as to have the same fraction of the total integrated intensity in said target-replaced sample spectrum as it did in said master control spectrum.
In one embodiment of the present invention, the method further comprises the subsequent step of:
(d) normalising said target-replaced sample spectrum to give a normalised target-replaced sample spectrum.
One embodiment of the present invention pertains to a method for processing a sample NMR spectrum for a test sample, said method comprising the steps of:
(a) identifying, in said sample NMR spectrum, one or more target regions for replacement, wherein each of said target regions is defined by a chemical shift range;
(b) providing a master control NMR spectrum which comprises one replacement region corresponding to each of said target regions, wherein a target region and its corresponding replacement region are defined by the same chemical shift range; and,
(c) replacing each of said target regions with the corresponding replacement region to give a target-replaced sample NMR spectrum,
wherein said replacement region has been scaled so as to have the same fraction of the total integrated intensity in said target-replaced sample NMR spectrum as it did in said master control NMR spectrum.
In one embodiment of the present invention, the method further comprises the subsequent step of:
(d) normalising said target-replaced sample NMR spectrum to give a normalised target-replaced sample NMR spectrum.
In one embodiment of the present invention, in said replacing step (c), each of said target regions is replaced with the corresponding replacement region to give a target-replaced sample spectrum,
wherein said replacement region has been scaled by a factor, f, given by the formula:   f  =                    I        Y            -                        ∑          k                ⁢                  I                      Y            ,                          T              ⁢                              xe2x80x83                            ⁢              k                                                          I                  C          ⁢                      xe2x80x83                    ⁢          M                    -                        ∑          k                ⁢                  I                                    C              ⁢                              xe2x80x83                            ⁢              M                        ,                          R              ⁢                              xe2x80x83                            ⁢              k                                          
wherein:
IY is the total integrated intensity of the sample spectrum;
IY,Tk is the integrated intensity of the target region;
ICM is the total integrated intensity of the master control spectrum;
ICM,Rk is the integrated intensity of the replacement region;
k ranges from 1 to nt; and,
nt is number of target regions.
Another aspect of the invention pertains to a sample spectrum which has been processed by a method according to the present invention.
Another aspect of the invention pertains to a method for processing a plurality of sample spectra, comprising processing each of said sample spectra by a method according to the present invention.
Another aspect of the invention pertains to a method of analysis of an applied stimulus, comprising the steps of:
(a) providing one or more sample spectra for each of one or more samples from each of one or more organisms which have been subjected to said applied stimulus;
(b) providing a master control spectrum derived from one or more control spectra for each of one or more samples from each of one or more organisms which have not been subjected to said applied stimulus;
(c) processing each of said sample spectra using a method according to the present invention.
In one preferred embodiment, the applied stimulus is a xenobiotic. In one preferred embodiment, the applied stimulus is a disease state. In one preferred embodiment, the applied stimulus is a genetic modification.
Another aspect of the invention pertains to a method for identifying a biomarker or biomarker combination for an applied stimulus, comprising a method of analysis of an applied stimulus as described herein.
Another aspect of the invention pertains to a biomarker or biomarker combination identified by such a method.
Another aspect of the invention pertains to a method of diagnosis of an applied stimulus employing a biomarker identified by such a method.
Another aspect of the invention pertains to an assay, which employs a biomarker identified by a method as described herein.
Another aspect of the invention pertains to a method of classifying an applied stimulus, comprising a method of analysis of an applied stimulus as described herein.
Another aspect of the invention pertains to a method of diagnosis of an applied stimulus, comprising a method of analysis of an applied stimulus as described herein.
Another aspect of the invention pertains to a method of therapeutic monitoring of a subject undergoing therapy, comprising a method of analysis of an applied stimulus as described herein.
Another aspect of the invention pertains to a method of evaluating drug therapy and/or drug efficacy, comprising a method of analysis of an applied stimulus as described herein.
Another aspect of the invention pertains to a method of detecting toxic side-effects of drug, comprising a method of analysis of an applied stimulus as described herein.
Another aspect of the invention pertains to a method of characterising and/or identifying a drug in overdose, comprising a method of analysis of an applied stimulus as described herein.
In one preferred embodiment, the spectrum or spectra is an NMR spectrum or NMR spectra.
Another aspect of the invention pertains to a computer system operatively configured to implement a method according the present invention.
Another aspect of the invention pertains to computer code suitable for implementing a method according to the present invention.
Another aspect of the invention pertains to a data carrier which carries computer code suitable for implementing a method according the present invention on a suitable computer system.
As will be appreciated by one of skill in the art, features and preferred embodiments of one aspect of the invention will also pertain to other aspects of the invention.