This invention relates to a method for analyzing complex mixtures of compounds. More particularly, the present invention is directed to a high efficiency method for screening a complex mixture of compounds for a predetermined characteristic, such as a chemical or biological property, and identifying the compounds in the mixture exhibiting said characteristic.
The emergence of automated chemical synthesis platforms coupled with combinatorial techniques as a routine tool in the pharmaceutical industry that has enabled the synthesis of large numbers of compounds in a relatively short time. Millions of potential new drug candidates are synthesized every year, and both pharmaceutical and biotechnology industries have embraced the challenge in recent years of developing new, faster and more efficient ways to screen pharmaceutical compounds in order to rapidly identify xe2x80x9chitsxe2x80x9d and develop them into promising lead candidates.
The development of a chemical lead into an ideal marketable medicine requires a balance in potency, safety, and pharmacokinetics. The overall cost of bringing a new medicine to the market place is very high: recent surveys indicate that the average new chemical entity taken to market in the United States requires 10 to 15 years of research and costs more than $300 million. The major reasons for failure in development often involve clinically unacceptable kinetics, bioavailability or toxicity and the need for uncovering this information at earlier (less costly) stages of the drug discovery process is evident. Thus, there is a significant need to develop methods for rapidly evaluating these properties, as well as the bioactivity potential, for the ever-increasing number of compounds generated by combinatorial chemistry. Analytical researchers have turned their attention to development of high-throughput analytical approaches. Chromatographic systems have been developed and specifically designed for automated high-throughput identification, purity assessment, purification and biological screening of combinatorial libraries. In parallel to this, a great deal of research effort has been devoted to the optimization of screening assays to meet the requirements of high throughput screening. In theory, any assay performed on the bench top can be applied in HTS (High Throughput Screening), but the adaptation of such assays to an automated format may not be straightforward and often requires modifications of the assay to circumvent constraints imposed by HTS. The ideal assay is one that can be performed in a single well with no other manipulation other than the addition of the sample to be tested. A number of assay formats have been developed or modified over the past few years to conform to the constraints imposed by HTS, particularly emphasizing automation, and miniaturization.
The technological advances directed toward the implementation of fast and high volume chromatographic systems have rapidly converged toward automated systems for accommodating the large number of compounds typically produced in parallel syntheses. Despite recent developments of rapid and efficient methods for high throughput screening, intense efforts are ongoing to create new, more economical and more efficient methods for carrying out screening processes.
A few methods addressing the need for more efficient screening techniques have emerged over the past few years. Recently, a chemical-screening scouting technique for preliminary chemical characterization for natural product extracts was used for the dereplication and prioritization of HIV-inhibitory aqueous natural product extracts by Boyd M. R. et al., xe2x80x9cA Chemical Screening Strategy for the Dereplication (elimination from further consideration) and Prioritization of HIV-Inhibitory Aqueous Natural Products Extractsxe2x80x9d, J. Nat. Prod., 1993, 56, No 7, pp 1123-1129). The method is based on a preliminary chemical characterization, or xe2x80x9cchemical screeningxe2x80x9d, of natural product extracts by performing a series of chromatographic separations on different columns addressing distinct chemical/physicochemical properties of the solutes. For example, Sephadex G-25 cartridges were utilized first to provide information about molecular size and weight. Bonded-phase cartridge, C4 wide pore (300 xc3x85) and C18 narrow pore (60 xc3x85), were utilized to determine the relative polarity of active constituents. Four fractions were collected from each cartridge; they were tested side by side with the parent supernatant for anti-HIV activity. Thus, a distinctive or characteristic chromatographic profile of the active constituents was obtained, along with information about the recovery of activity and, by inference, the stability of the active compounds. This chemical-screening approach was first validated with a number of HIV-inhibitory standards, e.g., AZT, Dextrin sulfate, Cyclosporin, and Oxathiin carboxanilide. As illustrated in FIG. 1, different patterns of elution were observed for each compound tested. The recurring patterns of bioactivity elution could be readily discerned from the matrix form shown. Similarly, several sponge extracts were evaluated for the elution pattern in the chemical screen (FIG. 2).
The chemical-screening approach was used to gain insight into the general chemical nature of potential new anti-HIV lead compounds, to identify and dereplicate additional recurring classes of antiviral compounds, and to select chromatographic procedures for initial fractionation of natural product extracts. FIG. 3 illustrates the overall dereplication and chemical screening strategy.
Julian et al at Eli Lilly and Company have developed a system that delivers data with reasonable throughput (Julian, R. K. Jr.; Higgis, R. E.; Gygi, J. D.; Hilton, M. D., xe2x80x9cA Method for Quantitatively Differentiating Crude Natural Extracts Using High-Performance Liquid Chromatography-Electrospray Mass Spectrometryxe2x80x9d, Anal. Chem., 1998, 70, pp 3249-3254). The system comprises three components: (a) HPLC separation using standard reversed-phase C18 gradient separation on the crude extract. (b) ESI-MS detection of effluent analytes, and (c) a computational image analysis techniques, the data are reduced to a list containing the m/z value and retention time of each ion. The ion lists are then compared in a pairwise fashion to compute a sample similarity index between two samples to allow effective comparison of the large data sets that are generated by the analysis.
Identifying the activity of compounds in complex mixtures, and isolating the active compounds, has been a method in the field of drug discovery for more than a century. Interestingly, many complex mixtures for use in humans contain only partially characterized complex mixtures, as found for example in herbal medicines. The chromatography of complex mixtures, particularly those mixtures which contain compounds with similar physical chemical properties, routinely generates chromatograms of co-eluting compounds. Examples include the chromatography of natural and synthetic chemical libraries using mass spectrometry to detect the compounds. Overlapping peaks is an unavoidable modern experimental problem in modern drug discovery whereby chromatography of complex mixtures is used as a chemical-source for identifying active compounds.
Methods to identify and isolate the activity of compounds comprising complex mixtures frequently will involve an initial test for activity in a crude sample preparation. If there is no activity, then there is usually no reason for further testing. However, if there is one or more activities in the crude sample preparation then the task of identifying the compounds containing the one or more activities is performed. The crude mixture may in fact be not only a natural product extract, but also pooled fractions from any chemical or biological source. The conventional methods routinely used to identify the active compounds in complex mixtures include the following steps.
Step 1) Prepare the crude sample, or aliquot of the crude sample, for chromatography.
Step 2) Load the sample on a chromatography column.
Step 3) Elute compounds in the complex mixture using a mobile phase system
Step 4) Detect compounds eluting from the column and collect fractions
Step 5) Analyze the fractions for activity, but pre-treat the samples for activity analysis if necessary (for example concentrate the sample by precipitation or lyophilization)
Step 6) Identify the compounds in the fractions containing the desired activity.
It is virtually impossible to unambiguously identify the active compound in active fractions containing more than one compound. Typically the active fraction(s) is(are) re-processed using Step 1) through Step 6) with the hope that a pure compound elutes in one fraction so that unambiguous identification of the active compound has been achieved. It is not uncommon for an investigator to utilize multiple columns to fractionate the activity of compounds in complex mixtures. The adage describing this problem is that chromatography is nothing more than xe2x80x98experimental fractionationxe2x80x99.
Note that if there are 2 or more active fractions, the number of chromatography steps significantly increases. More than 10-20 chromatography steps can result when only the first chromatography column demonstrates activity in 2 or more fractions. Numerous activity assays, sample preparation, and analysis are routine for the xe2x80x98experimental fractionationxe2x80x99 method of profiling the activity of compounds in complex mixtures.
The present invention increases the efficiency of xe2x80x98experimental fractionationxe2x80x99 in activity profiling. In one embodiment efficiency is increased by merely performing xe2x80x98experimental fractionationxe2x80x99 of the complex mixture on 2 or more columns, but instead of analyzing fractions from each column, synchronized pooling of the fractions from each column are carried out specifically for activity testing. Synchronized sample-pooling captures the time dependent flow among columns into a common fraction. In other words, if 2 chromatography columns are used, the eluent from both columns that elutes from 0-1 minute, for example, is pooled, then 1-2 minutes, etc. until the chromatography runs are completed. The reason for synchronized sample-pooling is to generate a set of pooled chromatography fractions from multiple columns that have a compound-profile and an activity profile. Mass spectrometry is optimum for obtaining a compound-profile among the synchronized sample-pool, whereas, numerous assays can be performed to obtain the activity profile of the synchronized sample-pool. Compound profiles are typically performed, for example, by spectral analysis on the fractions before they are pooled, more typically as each fraction or portion thereof is analyzed by the detector as it is eluted from the chromatography unit and thereby providing spectral data on each compound component of each fraction. Activity profiles can be assessed on the uncombined or the combined fractions. The active compounds are indicated by comparing/correlating the compound profile of the fractions with the activity profile of the optionally synchronously pooled fractions.
One or more active compounds in the complex mixture can be identified in one experiment by comparing the compound-profile to the activity profile. Note that the goal of using multiple columns with different mobile phases is to have each compound in the complex mixture elute at a different time on each column. This causes each compound in the complex mixture to be present in multiple samples of the synchronized sample-pool. The xe2x80x98set of fractionsxe2x80x99 comprising the synchronized sample-pool thus contain, each compound in the complex mixture, to be present in multiple fractions. Implementing the invention involves (1) detecting the pattern of compound-peaks, and (2) the activity-peak of each compound in the xe2x80x98set of fractionsxe2x80x99 prepared by synchronized sample-pooling. Compounds eluting as a distribution from columns are referred to as compound-peaks and synchronized sample-pooling results in a distribution of the compound among a few continuous fractions. An activity-peak in the synchronized sample-pools merely indicates that a few continuous fractions contain the compound and therefore the activity. It is recognized that, theoretically, if two columns were used for the present invention, each compound in the complex mixture should elicit 2 activity-peaks, for 3 columns 3 peaks, etc.
Accordingly, the present invention is directed to a method of identifying compounds having a predetermined characteristic in a complex compound mixture. The compound mixture is using a first xe2x80x98set of compound separation parametersxe2x80x99 to at least partially separate the compounds in the mixture into a series of separation variable-dependent fractions (Fa)n wherein n is the number of fractions collected using said xe2x80x98first set of separation variablesxe2x80x99. That step is repeated using a second xe2x80x98set of separation parametersxe2x80x99 to produce a second series of separation parameter dependent fractions (Fb)n wherein n is the number of fractions collected using the second xe2x80x98set of separation parametersxe2x80x99.
In one embodiment, each of the qth fractions obtained using each xe2x80x98set of separation parametersxe2x80x99, wherein q is the respective order number of the fractions obtained using each set of separation parameters, are combined to provide a set of combined qth fractions. Spectral data characteristic of the compound(s) in the combined fractions are obtained on a sample of each of the uncombined or combined fractions, and each uncombined or combined fraction is analyzed to detect the presence of a predetermined characteristic, e.g., a chemical, physical or biological characteristic, to identify those combined fractions that exhibit the characteristic. Preferably the spectral data (the compound profile) is obtained on the fractions before they are combined, for example, as they are eluted from a chromatographic column. The spectral data for each of the fractions exhibiting the predetermined characteristic are compared, typically using a computer implemented algorithm, to identify the spectral data (compound profiles) common to each of said fractions exhibiting the characteristics, and the compound or compounds indicated by the spectral data common to the combined fractions are identified. Mass-spectral data, such as that collected in electronic form by a MS detector on a separation device, is one example of the data used to identify compounds having the targeted characteristic. Any other analytical techniques capable of providing compound-unique characterizing data can be used as a substitute for mass spectral analysis in combination with the characteristic testing procedure in performance of the present invention. Such analytical techniques include, for example, ultraviolet absorption analysis, Fourier transform IR and Fourier transform nuclear magnetic resonance.
In summary, the compound mixture is subjected to at least two separation processes, each using a unique set of separation parameters, each process producing a series of separation parameter-dependent fractions. The fractions or a portion of the fractions from each separation process are combined on, e.g., a order number basis, and the combined fractions are evaluated or assayed for a predetermined property or characteristic using art-recognized assay techniques. Spectral data obtained on each combined fraction (or on the component reactions of each combined fraction) are then correlated with the assay results, and deconvoluted, preferably using an algorithm to identify the spectral data (and thus the compound(s)) common to each of the xe2x80x9cassay positivexe2x80x9d fractions. Alternatively, and typically with less efficiency, the fractions from each separation process are analyzed individually to produce a compound profile and activity profile for each fraction.
In one preferred aspect of the present invention the separation is carried out using a chromatographic separation methodology. One of the mechanisms associated with chromatographic processes is that of a reversible equilibrium of solutes between the mobile phases and the stationary phases. Separation occurs by differential migration, or preferential retention, of the various compounds in the stationary phase comprising the chromatographic unit. The equilibrium distribution of the different solutes between the stationary phase and the mobile phase is the basis of separation in chromatography. The magnitude of solute retention is a direct result from this equilibrium and is typically expressed by a parameter, the capacity factor, kxe2x80x2=(trxe2x88x92to)/to where to is the dead time and tr is the retention time of the solutes. The capacity factor is therefore a stoichiometric mass distribution equilibrium of solutes between the mobile phases and the stationary phases, and its determination allows the calculation of various physicochemical values according to pre-determined algorithms. The scope of the present invention is not restricted to equilibrium-based chromatography. Several other types of non-equilibrium based separation systems can be utilized in the present invention. For instance, size exclusion chromatography and gradients that cause compounds to convert from complete affinity (kxe2x80x2 approaches infinity) to no affinity (compound elutes from the column immediately). Mobile phase conditions that cause this include pH gradients, salt gradients, and even compound gradients whereby the compound causes the displacement of some absorbed molecules. Other examples include ion exchange chromatography or separation systems where the stationary phase contains an immobilized protein or macromolecule that exhibit known binding properties (for instance immobilized enzyme, antibody or receptor).
The separation process depends on different parameters (herein referred to as separation parameters or separation variables) including stationary phase, mobile phase composition, mobile phase flow rate gradients of such parameters, temperature, and column size among others. Typically, if the same mixture of compounds is subjected to different chromatographic separation conditions (wherein each separation being carried out using a different set of separation parameters), each set of separation parameters provides a unique chromatographic profile for the mixture. In addition, no two compounds in the mixture (provided that they are not related in any special way, like enantiomers), would be affected the same way for each chromatographic condition. In other words, a particular compound will exhibit a unique and distinct xe2x80x9cresponsexe2x80x9d (chromatographic profile or signature) under any one set of chromatographic conditions. Thus, for any one compound, a set of chromatographic profiles carried out using different sets of separation parameters constitutes a xe2x80x9cchromatographic fingerprintxe2x80x9d unique to this compound, and the probability for any two compounds to have the same chromatographic fingerprint decreases dramatically (to become zero) as the number of chromatographic conditions used to establish said chromatographic fingerprint increases.
The solutes in the fractions from each chromatographic separation can be analyzed by mass spectrometry (MS) or other detector capable of providing spectral data useful for providing identifying characteristics of the various compound solute. The pattern of the molecular ions detected by MS reflects the elution profiles of each compound detected for each separation and thus provides a molecular weight profile of the compounds eluting from the chromatographic unit. This is the basis of the LC/MS technology: the LC unit allows partial separation of the mixture of compounds injected onto the system (typically collected in a series of fractions), and the MS provides structural information on the compounds in each fraction.
Currently, automated, semi-quantitative assessment of combinatorial libraries is most readily accomplished by coupling HPLC with UV detection and mass spectrometry. Rapid HPLC methods with columns capable of delivering high-resolution separations have been developed in recent years, and have been well received by the drug discovery industry as a powerful tool particularly suited to handle the expanding analytical needs of combinatorial chemistry. The ability to characterize chemical libraries derived from combinatorial synthesis has in turn revealed that the purity of the compounds generated by this method is not necessarily high enough for biological evaluation of these compounds. Consequently, the scope of the high-throughput HPLC techniques initially designed and developed for structure confirmation purposes has expanded to include purity assessment and purification of the compound libraries to make them suitable for biological screening. One of the limitations of the LC/MS methods currently used in the drug discovery industry is that, unless all the compounds injected on the column are nearly perfectly resolved, bioassay evaluation of the collected fractions is usually not relevant since one or more xe2x80x9cactivexe2x80x9d fraction may contain more than one compound. The task is then to determine which compound(s) in the fraction of interest is (are) responsible for the xe2x80x9cpositivexe2x80x9d assay results.
In one embodiment of this invention there is provided a method of using HPLC coupled with a mass spectrometer to produce at least two sets of eluent fractions (or sets of data reflecting kxe2x80x2 and peak widths deriving from the separation process) of a complex compound mixture, each obtained using a distinct set of chromatographic separation parameters. The data can be recorded with concomitant collection of a series of fractions (one series of fractions per set of separation variables), or the mass spectral data can be obtained on each fraction or combined fractions. Thus for the first set of separation parameters (run (a)), fractions Fa1, Fa2, . . . , Fan are collected, and the solution eluting from the chromatographic system is either simultaneously analyzed by MS, or analyzed on a fraction-by-fraction basis. Using a second unique set of separation parameters (run (b)), fractions Fb1, Fb2, . . . , Fbm are collected with concomitant determination of the chromatographic elution profile by MS detection. After the fractions for each chromatographic run are collected, all the qth fractions (where q is the order number of the fraction) obtained from each separation parameter-dependent chromatographic separation are combined and the respective combined fractions are assayed/analyzed for the presence of a predetermined characteristic, e.g., a physical, chemical or biological property. Optionally, but with less efficiency, the fractions can be assayed individually. Any art-recognized assay can be used to analyze the combined fractions, including antibody-, receptor-, or enzyme-based specific binding base assays, electrochemical assays, photometric/fluorescence assays, disc assays, calorimetric assays, cytotoxicity assays and the like. The fractions that exhibit the targeted property are identified and a computer pattern-matching algorithm (e.g., fraction activity vs. mass spectral data or other compound characterizing data) for the xe2x80x9cactivexe2x80x9d fractions is used to identify the chemical entities in the initial complex compound mixture that exhibit the chemical, biological, or physical property of interest. The mass spectral data can be collected in electronic form and used as input for a computer algorithm for processing the data and correlating it with those fractions which are found to exhibit the target activity or characteristic. The mass spectral data (or spectral data from another detection device) can be collected continuously during fraction collection, they can be collected on each fraction from each run, or they can be collected on each of the respective combined fractions.