The field of this invention relates to methods of identifying common elements or patterns in biological profiles, such as common elements in gene expression profiles, in response to different drug treatments. The invention also relates to the application of these methods to identify ideal drug profiles as well as undesired drug profiles. Further, the invention also relates to the application of these methods to compare profiles from existing drugs to these ideals.
Within the past decade, several technologies have made it possible to monitor the expression level of a large number of genetic transcripts (see, e.g., Schena et al., 1995, Science 270:467-470; Lockhart et al., 1996, Nature Biotechnology 14:1675-1680; Blanchard et al., 1996, Nature Biotechnology 14:1649; Ashby et al., U.S. Pat. No. 5,569,588, issued Oct. 29, 1996) and proteins (see, e.g., McCormack et al., 1997, Analytical Chemistry 69:767-776; Chait-BT, 1996, Nature Biotechnology 14:1544) within a cell at any one time. In organisms for which the complete genome is known, it is possible to analyze the transcripts of all genes within the cell. With other organisms such as human, for which there is an increasing knowledge of the genome, it is possible to simultaneously monitor large numbers of the genes within the cell.
Applications of this technology have included, for example, identification of genes which are up regulated or down regulated in various physiological states, particularly diseased states. Additional uses for transcript arrays have included the analyses of members of signaling pathways, and the identification of targets for various drugs. See, e.g., Friend and Hartwell; U.S. Provisional Patent Application Serial No. 60/039,134, filed Feb. 28, 1997; Stoughton, U.S. patent application Ser. No. 09/099,722, filed Jun. 19, 1998; Stoughton and Friend, U.S. patent application Ser. No. 09/074, 983, filed May 8, 1998; Friend and Hartwell, U.S. Provisional Application Serial No. 60/056,109, filed Aug. 20, 1997; Friend and Hartwell, U.S. application Ser. No. 09/031,216, filed Feb. 26, 1998; Friend and Stoughton, U.S. Provisional Application Serial No. 60/084,742 (filed May 8, 1998). 60,090,004 (filed June 19, 1998), and 60/090,046 (filed Jun. 19, 1998). Such applications are based upon the knowledge that abundances and/or activity levels of cellular constituents (e.g., mRNA species, proteins, and other molecular species within a cell) change in response to perturbations in a cell""s biological state, including drug treatment or changes in a protein""s activity. Thus, a measurement of such cellular constituents, referred to herein as a xe2x80x9cbiological profile,xe2x80x9d or xe2x80x9cprofile,xe2x80x9d contains a wealth of information about the action of the perturbing agent.
The ability to measure and compare such biological profiles has the potential to be of great human and commercial benefit. For example, it would be of great benefit if an xe2x80x9cidealxe2x80x9d or xe2x80x9cconsensusxe2x80x9d response profile could be identified across a large set of cellular constituents, for example all or substantially all of the genetic transcripts of a cell or organism, which characterizes a desired drug activity (e.g., a desired clinical effect). Likewise, it would also be of great benefit, e.g., during the process of drug discovery and design, to provide and compare response profiles of known or existing drugs to such a consensus profile, e.g., to identify promising drug candidates with a particular, desired, therapeutic effect, or to develop theories of why particular individual compounds have clinically superior toxicity profiles. Indeed the basic concept of generating and comparing response profiles to known profiles for the purpose of predicting drug effectiveness and toxicity has been proposed (see, in particular, Fodor, U.S. Pat. No. 5,800,992; Rine and Ashby, 1998, U.S. Pat. No. 5,777,888)
However, the biological profile of any real cell or organism is of tremendously high complexity. Any one perturbing agent may cause a small or large number of cellular constituents to change their abundances and/or activities. Thus, to completely or even mostly characterize the biological response to a particular perturbation it is generally necessary to measure independently the responses of all, or at least most, of the cellular constituents in a cell. Yet, the number of cellular constituents, e.g., for a mammalian cell, is typically on the order of 105. Further, current techniques for quantifying changes in cellular constituents suffer from high rates of measurement errors, including false detections, failures to detect, or inaccurate quantitative determinations. Thus, in practice such analyses of biological profiles is too cumbersome and fraught with technical problems to be practical.
Accordingly, there is a need for methods of analyzing biological profile data which overcome the above limitations in the prior art, and, in particular, which reduce error rates and simplify the structure of changes in the profile data. In particular, there is a need for methods of analyzing biological profile data to derive a simplified xe2x80x9cconsensus profile,xe2x80x9d e.g., for a drug, drug family, or group of related compounds, which characterizes a desired (i.e., ideal) biological effect. Further, there is a need for methods to compare such consensus profiles to the biological profiles of individual drugs or drug candidates.
Discussion or citation of a reference herein shall not be construed as an admission that such reference is prior art to the present invention.
The present invention provides methods for determining a xe2x80x9cconsensusxe2x80x9d profile for a biological response, such as the response of an organism to a group or family of drugs and/or drug candidates. The consensus profile obtained by the methods of this invention represents an ideal, desired activity profile across some standard measurement set such as the cellular constituents of a cell or model organism, or of an organism destined for treatment, e.g., by drug therapy. As such, the consensus profiles of this invention indicate those elements or patterns in a biological profile which the individual compounds have in common. Preferably, such elements or patterns are associated with a particular biological effectxe2x80x94most preferably a particular, desired, therapeutic effect, or xe2x80x9cidealxe2x80x9d effect. Accordingly, the present invention also provides methods for obtaining a response profile for a particular compound, such as for a particular drug or drug candidate, and for comparing the response profile of the particular compound to the consensus profile to determine the extent to which the particular compound exhibits a particular, i.e., xe2x80x9cideal,xe2x80x9d effect as opposed to xe2x80x9cnon-idealxe2x80x9d or toxic effects.
Such methods are useful, e.g., in the process of drug discovery or design, for identifying compounds which best meet or satisfy a desired activity profile, as well as for identifying compounds which fall short of a desired activity profile. The methods of the present invention are also useful for analyzing further chemical modifications to lead compounds, or for developing theories of why certain individual compounds have superior toxicity profiles. Finally, because the biological response to a particular compound or compounds will frequently vary between individual organisms, the methods of the present invention are also useful during treatment of an individual, e.g., in a clinical setting, to determine the best compound or combination of compounds to produce a desired therapeutic effect.
The invention is based, at least in part, on the discovery that for any finite set of conditions, including, for example, treatments with different concentrations of related compounds, individual cellular constituents will not vary independently from one another. Rather, sets of cellular constituents will tend to change together, or xe2x80x9cco-vary,xe2x80x9d under a given set of conditions. Accordingly, the structure of biological profiles can be greatly reduced, without losing accuracy or completeness, by grouping cellular constituents into sets, referred to herein as co-varying sets, which co-vary under some set of conditions. Preferably, the set of conditions includes the conditions or perturbations under investigations (i.e., graded exposure to the individual drugs or compounds being studied). In fact, because grouping constituents into co-varying sets actually averages experimental errors, error rates are reduced, thereby enabling better detection, classification, and comparison of changes in cell profiles.
The methods of the present invention include: (i) obtaining or providing response profiles for the biological response (or responses) of interest; (ii) defining sets of co-regulated cellular constituents (i.e., genesets) in the response profiles; and (iii) identifying common response motifs among the defined sets of co-regulated cellular constituents which are associated with particular biological responses such as drug effectiveness or toxicity. The common response motifs thereby identified comprise the consensus profiles of the invention. In preferred embodiments, the methods of the invention further include the step (iv) of xe2x80x9cprojectingxe2x80x9d the original response profiles onto the genesets identified in step (ii) above. Simplified, reduced-dimension response profiles are thereby produced which are more simply and robustly related to biological properties such as drug effectiveness and toxicity.
In various embodiments, the response profiles may be obtained, e.g., by measuring gene expression, protein abundances, protein activities, or a combination of such measurements. In various embodiments, the methods of the invention further comprise a step of selecting only those cellular constituents that show significant response in some fraction of the response profiles. In various embodiments, the methods of the invention may further comprise the implementation of a clustering algorithm or other pattern recognition procedure to group the cellular constituents into co-regulated sets. In various embodiments, the methods of the invention may further comprise the implementation of a clustering algorithm or other pattern recognition procedure to group the response profiles according to similarity. In various preferred embodiments, the grouped cellular constituents and response profiles are displayed, e.g., in a false color plot, to facilitate the identification of major sets of cellular constituents and common response motifs in steps (ii) and (iii) above.
In more detail, the present invention provides, in a first embodiment, methods for determining a consensus profile for a particular biological response. Such methods involve identifying common response motifs among sets of co-varying cellular constituents in a plurality of perturbation response profiles, wherein the common response motifs are associated with the particular biological response. The biological response is typically associated with a particular biological effect, such as the effect of a particular class or type of drug, a therapeutic effect, or a toxic effect. In various aspects of this first embodiment, the sets of co-varying cellular constituents comprise sets of cellular constituents that are co-regulated, and/or cellular constituents which are co-varying in the plurality of perturbation response profiles. Such co-varying cellular constituent sets are identified, e.g., by cluster analysis of cellular constituents in the plurality of perturbation response profiles. In still other aspects of this first embodiment, the perturbation response profiles are re-ordered into sets associated with similar biological effect, e.g., by cluster analysis. In other aspects of the first embodiment, the co-varying cellular constituents comprise basis cellular constituent sets, and the perturbation response profiles are projected onto the basis cellular constituent sets to provide projected response profiles.
The consensus profile determined in the first embodiment of this invention is, in particular, the intersection of the sets of co-varying cellular constituents activated or de-activated in the common response motifs. The intersection may be identified, e.g., by visual inspection of the plurality of response profiles, by thresholding the projected response profiles, or arithematically.
In a second embodiment, the present invention also provides methods for comparing a biological response profile to a consensus profile. The methods comprise (a) converting the biological response profile into a projected response profile according to a definition of basis cellular constituent sets, and (b) determining the value of a similarity metric between the projected response profile and the consensus profile. Preferably, the basis cellular constituent sets comprise cellular constituent sets which co-vary. The similarity metric may be, in certain aspects of the second embodiment, the generalized cosine angle between the projected response profile and the consensus profile.
In a third embodiment, the present invention provides methods for analyzing a biological sample. In particular, the methods of this embodiment comprise (a) grouping cellular constituents from the biological sample into sets of cellular constituents that co-vary in biological profiles obtained from the biological sample, and (b) grouping the biological profiles obtained from the biological sample into sets of biological profiles that effect similar cellular constituents. In a preferred aspect of this embodiment, one or more cellular constituents and/or one or more response profiles associated with a particular biological effect are identified from such sets of cellular constituents and/or biological profiles. For example, in some aspects the cellular constituents comprise genes or gene transcripts so that one or more genes associated with a particular biological effect are identified. The genes identified by the methods of this embodiment may be known or previously unknown genes.
Finally, the methods of this invention are preferably executed on automated systems, e.g., computer system, capable of performing the above methods. Accordingly, this invention also provides, in a third embodiment, computer systems comprising a computer-usable medium having computer readable program code embodied theron for effecting the methods of this invention.