The proteome is usually described as the entire complement of proteins found in a biological system, such as, e.g., a cell, tissue, body fluid, organ or organism. The study of naturally occurring proteins is generally termed ‘proteomics’ and encompasses study of the proteome expressed at particular times and/or under internal or external conditions of interest. Proteomics approaches frequently aim at global analysis of the proteome and require that large numbers of proteins, e.g., hundreds or thousands, can be routinely resolved, identified and quantified from a single or multiple sample(s).
Among the promises of proteomics is its ability to recognise new biomarkers, i.e., proteins as biological indicators that signal a changed physiological state, for example due to a disease or a therapeutic intervention. Biomarker discovery usually involves comparing proteomes expressed in distinct physiological states and identifying proteins whose occurrence or expression levels consistently differ between the physiological states (Schrattenholz A, Groebe K. Electrophoresis. (2007) June 28(12) 1970-9).
Proteins in blood are a particular target for the identification of markers of disease states and drug treatments. It is widely assumed that the amounts and/or conformation of proteins in the blood should be statistically related to such states in a manner that outweighs intrinsic natural variability. Blood and other body fluids are a particular target because they bathe affected tissues, transport vital proteins and can be obtained for testing using relatively cheap and straightforward procedures during a medical consultation.
However, proteins in blood have a very large range of concentrations, with a small number of proteins accounting for over 99.9% of all proteins and the rest occupying a distribution from picogram to milligram per milliliter (Qian W. J. et al., Mol. Cell. Prot. (2006) 5(10) 1727-1744). Due to the limitations of existing proteomics techniques, this abundance range remains hypothetical. Proteomics scientists have employed a variety of methods to reach into this range, with the aim of also minimizing disruption to the relative abundances of the proteins. This often requires the exclusion of the high-abundance proteins by selective purification. Attempts have also been made to reduce the complexity of the peptides obtained by selectively focusing in on subsets of all peptides in the sample. These procedures are lengthy and the reproducibility between samples, replicates, machines and laboratories has yet to be demonstrated in a manner that would be a pre-requisite for the statistical discovery of biomarker proteins.
Conventional Mass Spectrometry (MS)-based proteomics, commonly used in biomarker discovery, proceeds by separating biological samples to isolate single proteins from the mixture under investigation. More recently, this has advanced from 2D-gels to multi-dimensional column-based high performance liquid chromatography (HPLC). Proteins can be broken down into shorter subunits or peptides. Isolated peptides are then fed into a mass spectrometer that ionises the peptides and breaks them up further, yielding a ladder of mass/charge measurements. These measurements and their abundances can also be quantified under a variety of schemes, usually relative to some control. The resulting ladder of spectra is then interpreted against known peptide sequences or blindly from raw data and the obtained mass and sequence information is used to search sequence databases to identify the proteins from which the respective peptides originated.
However, proteolysis of complex biological samples usually produces hundreds of thousands of peptides which may overwhelm the resolution capacity of known chromatographic and MS systems, causing incomplete resolution and impaired identification of the constituent peptides. Typically in MS-based proteomics, as many as 80% of the spectra derived from the sample cannot be accurately or consistently re-interpreted into discriminatory peptides or thence proteins. Their fragmentation behaviour and abundance in the MS process can also be context dependent, further complicating reproducibility (Liu H, Sadygov R G, Yates J R 3rd., Anal. Chem. (2004) July 15 76(14) 4193-201).
One method to enable proteomic analysis of biological samples is to reduce the complexity of peptide mixtures generated by separation of such samples, before subjecting said peptide mixtures to downstream resolving and identification steps, such as chromatographic separation and/or Mass Spectrometry (MS). Ideally, reducing the complexity of protein peptide mixtures will decrease the average number of distinct peptides present per individual protein of the sample, yet will maximise the fraction of proteins of the sample actually represented in the peptide mixture.
The use of blood (serum or plasma) is further obscured by biological processing of proteins in a variety of ways that confound MS-based ascertainment (Qian W. J. et al., Mol. Cell. Prot. (2006) 5(10) 1727-1744). Recent studies have shown wildly contradictory attempts to identify and count proteins from clinical samples. The very act of isolating, fragmenting and measuring proteins in conventional MS-based proteomics alters their relative abundance and chemical makeup.
The output of proteomics-based analysis is a “hit list” of proteins that are significantly correlated in the samples under study. Typically, this list is a selection of proteins of variable statistical significance. A biologically oriented study of the list is usually made and a rational choice is made about the potential biological significance of each member. The process of reaching a list of hypothetical biomarkers, or proteins of putative statistical significance, is generally called ‘Discovery.’
Efforts are then focused on verifying and validating a small number of chosen “hits.” This involves confirmation of the measured protein abundances in a broader population of clinical samples with the objective being to show that the discovered and chosen proteins are genuine and not false positives and that they are specific to the disease or drug state. This process of confirming the quantitative significance of a proteomics measurement and putative biomarker in a more generalised population is usually referred to as ‘Validation’ (Zolg, Mol. Cell Prot. (2006) 5(10), 1720-1726). Often this requires the use of an alternative technology to the discovery phase.
Typically, this is achieved using antibodies raised to the purified putative proteins. These antibodies can then be used in an ELISA-based assay against hundreds of samples, the relevant statistics reapplied and the validity of the protein as a biomarker established. The ultimate intention of this process is also for the antibody to become part of a clinical assay. However, antibodies are time-consuming and costly to produce and it is not always possible to raise an antibody specific to a given protein. A further complication is the presence of alternative variants of the protein or isoforms that may have crusts of attached sugar molecules and other modifications on their surface. Isoforms may not be evident in the Discovery phase MS based identification and thus complicate the statistical significance of the measurements as well as the raising of specific antibodies (Zolg, Mol. Cell Prot. (2006) 5(10), 1720-1726; Rifai et al., Nature Biotech (2006) 24(8) 971-83).
Both Discovery and Validation have high failure rates and are time consuming and costly exercises. Success and failure can usually only be judged after the Validation stage after much time and financial expenditure. Therefore, recent efforts have focused on the refinement of MS-based approaches for discovery and on the bulk generation of antibodies in an effort both to accelerate Discovery, Validation and to derive a clinical product (Anderson and Hunter, Mol. Cell Prot. (2006) 5(4) 573-588; Zangar et al., Exp. Rev. Prot. (2006) 3(1) 37-44).
Accordingly, there is a clear need for new methods to interrogate proteins, particularly complex mixtures of proteins, derived from body fluids. Specifically, there is a need for a technology to identify and measure the abundance of proteins, especially abundance of multiple proteins in a complex mixture, that overcomes the technical drawbacks and scientific and cost constraints that hampers current technology. In particular, tools are needed to replace the costly and sometimes unreliable generation and use of antibodies to identify and validate target proteins. Such tools can also be used to supplant existing Discovery based technologies such as MS, provided a sufficient diversity of antibody alternatives can be generated quickly and cheaply enough, to probe the full range of native proteins in a biological sample.
A clear desideratum for such a tool is minimal manipulation of the mixture so that a true representation of the proteins in their native configuration, together with any natural modification and variations, in the sample may be obtained. Such a tool should also be highly and accurately reproducible.
Aptamers are short polymers, usually nucleic acids (DNA, RNA, PNA), that form well-defined three dimensional shapes, allowing them to bind target molecules in a manner that is conceptually similar to antibodies. Aptamers combine the optimal characteristics of small molecules and antibodies, including high specificity and affinity, chemical stability, low immunogenicity, and the ability to target protein-protein interactions. In addition to high specificity, aptamers have very high affinities to their targets. Typically, aptamers generated against proteins have affinities in the picomolar to low nanomolar range. In contrast to monoclonal antibodies, aptamers are chemically synthesized, rather than biologically expressed, offering a significant cost advantage.
While aptamers provide a useful and effective alternative to antibodies, there still remains a problem on how to quantify proteins, via aptamers. Complex mixtures of proteins and the large dynamic range of proteins present in biological samples provide a particular problem. Typically, aptamers are quantified using radiolabels. While quantification of radiolabels is acceptable, and has been for a number of years, there remains a need to improve and expedite quantification, as well as improve accuracy and reproducibility.