The study of proteins is a key endeavor of current biological research, as well as a focus of pharmaceutical research and development. The information revealed by sequenced genomes increases the pace and activity of protein research, for example for the development of a cell-based assay, analysis of a pathway, study of a single receptor, or the application of proteomics. Current technologies fail at several key points: they can miss entire protein families; fail to identify protein pathways; focus on a single protein at a time; and they are expensive, difficult and slow. Importantly, no current technology provides information on protein dynamics. In fact, results of current large-scale and high-throughput protein analysis are often delayed by days or weeks following an experiment, and are usually restricted to the form of a catalogue, tabulating those proteins of a database that have been putatively identified from the analyzed sample.
Genomics, Proteomics and the Barriers of Biological Knowledge
Proteomics is an emerging technology that attempts to study proteins on a large scale in high-throughput. It is not by chance that the term resembles “Genomics”. In the wake of successful technologies such as whole genome sequencing, DNA chips and SNP cataloging, a search started for similar paradigms in the realm of proteins. This search is worthwhile since proteins are the main vehicles of life processes: they are the biochemical enzymes, form the signal pathways, control the cellular processes, underpin the cell scaffolding, transport molecules and so on. They are also potentially more valuable than DNA in terms of human benefit, due to their importance in human disease: most known drugs are either proteins themselves, or else operate by binding to a protein target. Unfortunately, proteins are also so much more complex and difficult to study than DNA. They are more complex for a number of reasons. For example, there are many more proteins than there are genes; protein expression is complex and has a high dynamic range—from single copies to millions per cell; the proteome of one cell type may be very different than that of another, even though their DNA is identical; and proteins may undergo dramatic changes in their structure—through cleavage, modification, and interaction. Proteins are more difficult to study than DNA, since protein extraction, separation and identification are difficult; there is no amplification technique that parallels PCR; three-dimensional protein structure is hard to obtain and use; protein expression has high dynamic range; protein modifications, cleavages and interactions are to a large extent unknown; and, finally, both as cause and effect, protein databases are thin and sparsely populated, encompassing a small fraction of all theoretical proteins, especially in higher organisms, such as Homo sapiens. 
In one aspect, though, Proteomics and Genomics are similar: both raised high hopes of creating a paradigm shift, a breakthrough that will yield a new understanding of cellular processes and human disease, and pave the way to a bounty of new drugs and therapeutics. Unfortunately, first for genomics and then for proteomics, it became abundantly clear that though genomic and proteomic data is extremely valuable, it is far from sufficient for achieving the breakthrough that was hoped for (Miklos, G. L. and Maleszka, R., Protein functions and biological contexts. Electrophoresis 22:169-178, 2001). So many pieces of the puzzle are still missing that the clear and complete view of cellular machinery remains hidden. One important piece of this puzzle is protein synthesis data—which proteins are produced at which times, under which conditions, and in which amounts. The ability to study and monitor this type of data would be a major breakthrough for all life science related research.
Proteomics Practice Today
Mainstream proteomic analysis today includes the processes of protein purification from culture, separation with two-dimensional gel or other chromatographic techniques, mass-spectrometry, and analysis of the resulting spectra for protein identification and characterization.
The extraction of proteins from bacterial or cell culture invariably involves lysis (and therefore death) of the cells. The procedure involves several stages and usually takes hours (Branca M A, Sannes L J. Proteomics: A Key Enabling Tool for Genomics? Cambridge Healthtech Institute's Genomic Reports. April 1999; Humphery-Smith I., Cordwell S. J., Blackstock W. P., Proteome research: complementarity and limitations with respect to the RNA and DNA worlds, Electrophoresis 18 (1997) 1217-1242). Protein separation with two-dimensional gels requires at least 24 hours and an expert human operator; their analysis is often much more difficult, even with modern software (Smilansky, Z. Automatic registration for images of two-dimensional protein gels, Electrophoresis 2001, 22, 1616-1626). Even worse, two-dimensional gel technology is not applicable to very acidic or very basic proteins, to many membranal proteins, and most importantly, to proteins that are expressed in low amounts.
It is usually taken for granted that proteins that are expressed at less than 1000-10,000 copies per cell cannot be visualized in two-dimensional gels (Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H., Aebersold, R., Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 1999, 17 (10): 994-9). Almost no protein kinases, phosphatases, transcription factors, GPCRs, ion channels, or nuclear hormone receptors are found in standard human proteomic analyses, even though more than 5000 of these proteins are encoded by the human genome (Miklos, G. L., Maleszka, R. Protein functions and biological contexts, Electrophoresis 22:169-178, 2001). Thus, the proteins that can be analyzed by this method are only the most common ones.
Besides separating the sample, two-dimensional gel technology can measure three important protein parameters: mass, pI, and quantity. However, all three are hopelessly inaccurate. As for protein quantity, the most that may be obtained from gel technology is relative quantitation, and even that at accuracies worse than 50% error—so that only proteins with very strong up- or down-regulation can be identified. Moreover, quantitation at best means quantity of protein in the extracted, processed sample, such as in a gel spot or in a chromatographic fraction; estimation of protein copies in a cell at any given time is not even attempted today.
Following protein separation, MS analysis may be performed, either with a MALDI-TOF or with an LC-MS-MS machine (Humphery-Smith I., Cordwell S. J., Blackstock W. P., Proteome research: complementarity and limitations with respect to the RNA and DNA worlds, Electrophoresis 18 (1997) 1217-1242; Yates J. R., Database searching using mass spectrometry data. Electrophoresis 1998, 19 (6): 893-900). The main stages are spot picking from the gel followed by destaining, or alternatively chromatographic prefractionation, followed by protein digestion with a protease (almost invariably trypsin), mass-spectrometric analysis, and finally database searching, which is performed, surprisingly, only as a semi-automatic procedure with expert supervision and decision making—as in the stages of peak extraction and candidate selection.
All in all, the standard technique for identifying proteins in a cell culture takes from weeks to months, is suitable for only a small part of the proteome, does a bad job of quantitating protein amounts, and provides no clue as to proteome dynamics.
Additional and Emerging Proteomics Technologies
An important older method for protein analysis is Edman degradation, a chemical analysis method where the C-terminal amino acids of a polypeptide are cleaved and analyzed one by one. The procedure requires a full day and provides no quantitative or dynamic information.
The shortcomings of two-dimensional gel technology have led many researchers to look for alternatives. Two important developments of the last few years are the techniques of ICAT (Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H., Aebersold, R., Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 1999, 17 (10): 994-9) and MudPIT (Washburn, M. P., Wolters, D., Yates J R 3rd. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 2001 Mar. 19 (3): 242-7), which involve MS analysis of whole sample digestion products. The two methods allow better identification of rare proteins, and the first one even allows computation of differential expression. However, they are still difficult and expensive to carry out, require cell lysis, take days for complete analysis, and provide no dynamic information.
Protein chips are being developed in several labs (Jenkins, R. E. and Pennington, S. R. Arrays for protein expression profiling: towards a viable alternative to two-dimensional gel electrophoresis? Proteomics. 2001 Jan. 1 (1):13-29). They generally fall into one of three classes: surface chemistry chips, antibody chips, or protein chips for determining protein-protein interactions. All of these may aid protein analysis in some way, but none of them provides the data that the disclosed method provides.
Yeast-two-hybrid technique (Y2H) is a feat of bioengineering that helps discover protein-protein interactions (Legrain, P. and Selig, L., Genome-wide protein interaction maps using two-hybrid systems. FEBS Lett. 2000 Aug. 25; 480 (1):32-6). The method is indirect in that the interactions occur in yeast or in bacteria, rather than in the original cells being analyzed. It is known to generate a large number of false-positives and also cannot generate dynamic information. Thus, there are clearly a number of significant differences between the present invention and the disclosed method.
High Throughput Screening and Cell Based Assays
High throughput Screening (HTS) is the standard route for drug discovery in the pharmaceutical industry. Traditionally, HTS relies on a simple assay, such as receptor binding or enzyme activity. The assay itself measures a single parameter, e.g. receptor binding. This measurement is initially the only information available on the suitability of the candidate compound as a potential drug. The rest of the required information—ADME-TOX for example—is either presumed to be known or else its acquisition is delayed till later stages in the process (see also next section).
In contrast with simple assays, cell-based assays are newer to the pharmaceutical industry. They are usually used for lead optimization and predictive toxicology. To construct a cell-based assay, a measurable cell characteristic has to be developed: this can be a fluorescent-tagged protein, an antibody based marker, or some measurable phenotypic characteristic of the cell. Modern examples include cancer-specific dyes (http://www.zetiq.com/site/cama.html) and genetically engineered cell lines (Shen-Orr, S. S., Milo, R., Mangan, S., and Alon, U., Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 2002, 31 (1): 64-8; http://www.cellomics.com/).
Cell based assays have many advantages over receptor binding assays. Cells offer better representations of a disease. By screening against disease pathways in whole cells, no prior assumptions are made about what makes a good target. However, cell based screening suffers from certain disadvantages. These disadvantages include the need to engineer a specific cell line with the required reporting capability, and the lack of information about the would-be protein target. In both assay types, standard and cell-based assays, high-throughput screening provides a minimal amount of information on a large number of compounds. This of course limits the scope of information obtainable, and the entire cascade of events following administration of the compound under analysis remains hidden from the researcher.
Improved solutions for the above problems are clearly required, for example for pharmaceutical research and development. Despite the huge increase in investment and the enormous contributions of genomics and related technologies, the main difference between the pharmaceutical pipeline today and a decade ago is in the number of targets, while the number of successful drugs entering the market has more or less stayed the same. More discouraging yet is the fact that while advances in high-throughput screening, chemical compound library design and bioinformatics have helped multiply the number of “hits” in HTS assays, the number of “leads” has not increased at all. Thus, the pharmaceutical pipeline today has an abundance of targets on the one side and an abundance of candidate compounds on the other, but attempts to combine this information has yielded little.
Though there is more than one reason for this failure, one important point is that though the numbers of targets and candidates is huge, the complexity of the cellular machinery, not to mention tissue and whole organism, is on a grander scale still. Thus, a better view of function and context of a protein target in the cell, as well as the complex effects, side effects, and after effects of a drug compound on the cell, are all clearly missing.
In today's paradigm of drug development, once a target is found and a compound that binds to it is identified, drug development starts to proceed toward towards regulatory approval and market acceptance. While the process is long and very expensive, it is narrow in the sense that relatively little is known about the target protein, its function, its isoforms and look-alikes, its roles in disease and in health. Even less is known about the drug candidate, how it affects proteins other than its specified target, how it affects other tissues, its immediate effects and its long term effects. Thus, information that may indicate that a compound cannot become a suitable drug candidate is revealed only at later stages and at a high cost—sometimes only after being distributed on the market. Among the medications which had to be recalled after market approval are the nighttime heartburn drug Propulsid (removed because of fatal heart rhythm abnormalities), diabetes drug Rezulin (removed after causing liver failure), and irritable-bowel-syndrome treatment Lotronex (removed for causing fatal constipation and colitis). All three were taken off the market in 2000.