A concerted international effort has lead to the deciphering of the human genome. Originally, it was estimated that greater than 100,000 genes would be encoded within the genome. However, research indicated that the >3.2 billion DNA units comprising the genome “only” coded for 30,000–50,000 genes (Human Genome Sequencing Consortium and Celera Genomics). Despite rapid progress in identifying genes, progress in identifying the activity and function of the gene products lags significantly behind. There is additional evidence that the actual number of proteins may reach 100,000–200,000 when one considers splice variants and post-translational modifications. This indicates that the actual number of possible protein targets for drug discovery is much greater than would be anticipated from the sequencing of the human genome. Also, there are potential non-protein targets for drug discovery including nucleic acids (e.g., mRNA and promoter regions of disease related genes) and lipids (e.g., members of the phophoinositol family of secondary messages). Thus, while sequencing of the human genome was a great leap forward for modern science, it is only the first step in determining the relationships between the majority of these genes and disease and their subsequent use as validated targets for drug discovery.
Many gene products function by binding to one or more other peptides or proteins. Presently, there are few approaches for identifying a protein's binder, e.g., the protein with which the target gene product directly interacts. This information is critical as protein:ligand interactions are involved in important cell processes such as signaling (e.g., information transfer in signaling cascades) and molecular processing. The identification of protein binders can be used to determine, for example, the ligand for a receptor, the substrate for an enzyme, and the regulatory protein for an enzyme complex.
Binder information is critical to developing a target-binding assay for the identification of drug leads. There is a large number of binding assays that exist, including in vitro and in vivo formats. Most in vitro formats require input of both target and ligand or target binder, and only a few formats require only target input. However, formats that require only a protein target generate a high frequency of false positives, i.e., compounds that bind but do not cause a change in target activity. Such formats would require extensive screening to identify ligands for previously uncharacterized protein targets.
At present, there are several approaches to finding and validating new drug discovery targets. For example, functional genomics involves the use of differential techniques (such as microarrays) to discern differences between normal and disease-related genes. A subset of this approach uses computational techniques to “mine” the public and private databases for differentially expressed genes. For example, tyrosine kinases, such as the epidermal growth factor receptor (EGFR) have been shown to be involved in a wide variety of cancers such as breast, prostate and colorectal carcinoma. In addition, others have used “bait and prey” techniques to identify natural partners and validate targets for drug discovery.
In the classical approach to binder identification, the protein target and its natural partner or target binder are isolated in a complex. In a modern approach, the target and natural partner or target binder are constructed as two fusion proteins that generate a signal upon interaction. As examples of the latter, yeast two-hybrid systems have been developed. The original intent of the yeast two-hybrid was to define the interactions between two proteins in a simple high throughput manner. The system utilizes two fusion elements consisting of a DNA-binding domain or bait, and a transcriptional activation domain or prey. These two chimeras could then be introduced into yeast cells with a reporter. Binding of bait to prey leads to the activation of the reporter with the appropriate readout such as growth in defined medium. Other two-hybrid like approaches include the bacterial two hybrid system (U.S. Pat. No. 5,925,523 to Dove et al.). However, two-hybrid systems have several disadvantages, including high levels of false positives, incompatibility with certain targets (e.g., RNAs and membrane bound proteins cannot be used), and problems with postranslational modifications. Moreover, approaches based on two-hybrid systems are not easily applied to a large number of genes of unknown function.
Phage display is a useful approach as a selection technique in which a peptide or protein is expressed as a fusion with a coat protein of a bacteriophage (phage), resulting in display of the fused protein on the exterior surface of the phage virion. Briefly, phage display has been used to create a physical linkage between a large library of random peptide sequences to the DNA encoding each sequence, allowing for rapid identification of peptide ligands for a variety of target molecules, such as for example, antibodies, enzymes, cell-surface receptors, etc., by an in vitro selection process called biopanning (Parmley, S. F. and Smith, G. P. (1988) Gene 73, 305–318; Reviewed in Cortese, R. et al. (1995) Curr. Opin. Biotechnol. 6, 73–80; Noren, C. J. (1996) NEB Transcript 8 (1), 1–5).
Briefly, biopanning is performed by incubating a library of phage-displayed peptides with a target, removing unbound phage, and eluting the specifically bound phage. However, a purified target is necessary in the preferred use of this methodology. The purification of active target is a cumbersome step made much more difficult when the protein under investigation does not yet have a known function needed to monitor the production of active protein. As such methodology does not afford the investigator a means for identifying large populations of unknown proteins such as those found to be differentially expressed in one cell versus another, or under some disease related condition at a time. Furthermore, when using one target at a time, the present approach requires that once the eluted phage is amplified, several cycles, usually 3 to 4 rounds, of biopanning and amplification is essential for successfully enriching the phage pool of tightly binding sequences.
Identifying novel disease-related targets, disease related populations of proteins, and their use in high throughput drug discovery is highly desirable for pharmaceutical and biotechnology companies in the post-genomic era. The traditional process of drug discovery relies on only a limited number of targets that could be screened using small chemical or natural product libraries. With the advent of biotechnology however, recombinant proteins and monoclonal antibodies became available as drugs to treat various diseases. In general, the use of these reagents had a solid experimental base prior to their use. For example, erythropoietin (EPO) is a growth factor involved in the regeneration of red blood cells by activating its cognate receptor. Knowledge of the EPO/EPO receptor nexus allowed the search for the ligand to proceed and eventually succeed. Similarly, monoclonal antibodies such as Herceptin (anti-erb B2) and C225 (anti-EGFR) are based on a body of experimental data dating to the identification of these receptors and their relationship to cancer.
Thus, there is a need for a screening system that ameliorates or overcomes one or more of the above or other encountered problems. An ideal system would allow the sampling of very large numbers of specificities of entire populations. These populations, i.e., proteomes, could, for example, contain protein members that are differentially expressed on one cell versus another. An ideal system would allow for the sampling of populations of targets having >106 members and populations of potential target binders having >1011 members. An ideal system would also allow for rapid sorting during a cloning round and rapid transfer of the genetic material coding for the binding molecule from one stage of the production process, to the next stage. Therefore, a rapid and unencumbered selection method for identifying and isolating novel disease-related targets and their use in high throughput drug discovery is highly desirable.