The human proteome, which is encoded by just some 25,000 genes, consists of millions of proteins variants, due to RNA splicing, post translational modifications (PTM), somatic DNA rearrangements, and single nucleotide polymorphisms (SNP). Despite the widespread interest in whole genome sequencing, the genome is largely fixed. While sequencing of mRNA is more informative, giving an average picture of the levels of gene expression, it yields little of post-translational modifications that generate the millions of proteins from just thousands of genes. There is a need of a massively parallel method to read human proteomes with high throughput and low cost. In contrast to the human genome in which DNA exists as diploid, the proteome has a wide dynamic range, for example the abundance of proteins in human plasma spans more than 10 orders of magnitude. Some proteins are expressed in a low quantity. Proteomics has no tool equivalent to the polymerase chain reaction (PCR) for protein sample amplification. There is no cost effective way to faithfully reproduce a protein population from a source. Thus, protein analysis must be carried out by extracting materials from samples removed from humans in some quantity.
There have been remarkable advances in sample preparation, and sequencing techniques, most notably based on mass spectroscopy as a proteomic tool. However, mass-spectrometers are large, costly machines. Their size is dictated by the need for very high mass resolution to obtain accurate identification of the amino acid components (and even then, readout is complicated by isobaric amino acids). Accordingly, there is a need for an alternative method for identifying amino acids, particularly in small quantities, and ideally at the single-molecule level.
In a series of earlier disclosures, WO2009/117522A2, WO 2010/042514A1, WO 2009/117517, WO2008/124706A2, US2010/0084276A1, and US2012/0288948, each of which is incorporated herein by reference, a system was disclosed where nucleic acid bases could be read by using the electron tunneling current signals generated as the nucleobases pass through a tunnel gap functionalized with adaptor molecules. A demonstration of the ability of this system to read individual bases embedded in a polymer was given by Huang et al.1 This method is referred to as “Recognition Tunneling” (RT).2 It was earlier recognized in these previous disclosures that, because the method is purely physical, in that it did not rely on the reactions of a DNA polymerase or ligase, any chemical residue should be able to be recognized provided that it generates a distinctive tunneling current signal.
Yet another problem in determining the human proteome arises because of the stereochemistry of amino acids. Amino acids can exist in two forms (“enantiomers”) that are mirror images of each other. Nature has chosen the so called “L” (for left-handed) form in general, but the presence of “D” (for dextro or right handed) amino acids is an important biomarker for diseases such as ALS or schizophrenia. Since these enantiomers are isobaric, they cannot be sensed by mass spectroscopy, so large amounts of additional sample are needed for optical identification. Furthermore, control of optical isomers is one of the most difficult problems in chemical synthesis and separation. Since the isomers are chemically identical, it is very hard to extract pure samples of one isomer or the other. Thus, a method to read the relative concentrations of isomers from very small amounts of sample would represent a major advance. Present, optical techniques require large amounts of sample. In addition, it would be advantageous to provide a simple method for reading the identity of optical isomers at the single molecule level.