1. Field of the Invention
The present invention relates to novel methods of producing proteins in which one or more domains are full length and correctly folded and which are each tagged at either the N- or C-terminus with one or more marker moieties and arrays containing such proteins, as well as the use of such arrays in rapid screening.
2. Related Art
The genome mapping projects are revolutionizing the therapeutic target discovery process and with it the drug discovery process. As new therapeutic targets are identified, high throughput screening of existing and combinatorial chemical libraries will suggest many potential lead compounds which are active against these targets. It will clearly be uneconomic to pursue all lead compounds through even early phase clinical trials; currently however no rapid method exists for evaluating such lead compounds in terms of their likely activity profiles against all proteins in an organism. If available, such a method would allow the potential toxicology profiles of all the lead compounds to be assessed at an early stage and this information would significantly enhance the process of deciding which lead compounds to pursue and which to set aside.
There is a complementary need in the pharmaceutical industry to identify all the targets of existing drugs (either in the market already or still in development) and hence to define their mechanism of action. The availability of such information will greatly facilitate the process of gaining regulatory approval for new drugs since it is increasingly clear that the regulatory bodies now regard a knowledge of the mechanism of action to be of paramount importance. In addition, this type of information would enable the design of improved second generation drugs. This follows because the majority of drugs have at least minor side effects, which probably result from binding of the drug or a metabolite thereof to undesirable targets; all of these target proteins need to be identified in order to define the criteria necessary for design of improved drugs. Currently however no simple method exists to generate this information and a number of potential multi-million dollar drugs fall by the wayside simply for lack of knowledge of the target of action.
Protein-protein interactions are being increasingly recognized as being of critical importance in governing cellular responses to both internal and external stresses. Specific protein-protein interactions therefore represent potential targets for drug-mediated intervention in infections and other disease states. Currently the yeast two-hybrid assay is the only reliable method for assessing protein-protein interactions but in vivo assays of this type will not be readily compatible even in a non-high throughput format with the identification of specific agonists or antagonists of protein-protein interactions. Functional proteome expression arrays, or “proteome chips”, will enable the specificity of protein-protein interactions and the specificity of any drug-mediated effect to be determined in an in vitro format. They will therefore have enormous potential because they will simply revolutionize this area of research.
One way in which functional proteome arrays could be generated is to individually clone, express, purify and immobilize all proteins expressed in the specific proteome. Here though, an important initial consideration concerns the absolute size of the genome of interest together with considerations about the availability of sequence data for the entire genome.
By way of illustration of these points, a typical bacterial genome is ˜5 Mbp and a small number have now been completely sequenced (for example Helicobacter pylori, Escherichia coli, and Mycobacterium tuberculosis); fungal genomes are typically ˜40 Mbp, mammalian genomes at ˜3 Gbp and plant genomes at ˜10 Gbp. Current estimates are that the human genome sequence will be finished around 2003, although how much of this information will be in the public domain is very much open to question. Clearly it will be completely impractical to expect that the genomes of anything other than representative model organisms will become available in a realistic time frame, yet from the perspective of functional proteomics, model organisms are of only limited value. So, whilst in principle within the next four years it may be possible to design and synthesize primers to clone each of the ˜100,000 genes in the human genome from cDNA libraries, in practice this will be both enormously expensive (the cost of primers alone would run in to several millions of dollars) and a hugely laborious process, even if the necessary sequence data is available.
But what about those pharmaceutically relevant organisms for which the complete sequence data will not be available? These cannot be simply ignored by functional proteomics so what are the alternatives? Expression cDNA libraries could in principle be used together with non-specific immobilization to create an array of proteins, but this technology is significantly limited by the fact that non-specific immobilization is usually associated with loss of function because the fold of the protein is disrupted. In addition, all host cell proteins will also be immobilized which will at best markedly reduce signal-to-noise ratios and at worst result in obfuscation of positive results. The ability to create a functional proteome array in which individual proteins are specifically immobilized and purified via a common motif or tag without affecting function and without requiring knowledge of the entire genome sequence would therefore represent a huge advance in the field of functional proteomics.