The human proteome, which is encoded by just some 25,000 genes, consists of millions of proteins variants, due to single nucleotide polymorphisms (SNP), somatic DNA rearrangements, RNA splicing, and post translational modifications (PTM).
In recent years, a need for a parallel method has emerged, which requires the ability to read human proteomes with high throughput and low cost. In contrast to the human genome in which DNA exists as diploid, the proteome has a wide dynamic range, for example the abundance of proteins in human plasma spans more than 10 orders of magnitude. Some proteins are expressed in a low quantity. Proteomics has no tool equivalent to the polymerase chain reaction (PCR) for protein sample amplification. There is no cost effective way to faithfully reproduce a protein population from a source. Thus, protein analysis must be carried out by extracting materials from samples removed from humans in some quantity.
There have been remarkable advances in sample preparation, and sequencing techniques, most notably based on mass spectroscopy as a proteomic tool. However, mass-spectrometers are large, costly machines. Their size is dictated by the need for very high mass resolution to obtain accurate identification of the amino acid components (and even then, readout is complicated by isobaric amino acids). Accordingly, there is a need for an alternative method for identifying amino acids, particularly in small quantities, and ideally at the single-molecule level. “Recognition Tunneling” (RT) has emerged as such a method, which is purely physical and does not rely on the reactions of a DNA polymerase or ligase, but also is able to recognize any chemical residue provided that it generates a distinctive tunneling current signal.
More specifically, the mechanism of recognition tunneling for reading nucleic acids, sugars, and amino acid sequences is based on the trapping of an analyte (i.e., a molecule of a nucleic acid, a sugar, an amino acid) by “reading molecules”, which are chemically tethered to two closely spaced electrodes, which generate a distinct tunneling signal upon a potential being applied across the electrodes. Specifically, the reading molecules are chemically bonded to the metal electrodes through a short linker while non-covalently interacting with the target molecule(s) at the other end. As target molecules of nucleic acids, sugar, amino acids, drug molecules pass through the tunnel, and a potential is applied between the electrodes, interaction of each such molecule with the reading molecules temporarily traps the analytes and produces tunneling signals, which comprises a particular current. The tunneling signal can be used to identify the analytes. Prior to the present disclosure, reading molecules for RT included imidazole-based reading molecules, namely, 4(5)-(2-mercaptoethyl)-1H-imidazole-2-carboxamide (ICA) (Liang, F.; Li, S.; Lindsay, S.; Zhang, P. Chem. Eur. J. 2012, 18, 5998-6007) and 5(6)-mercapto-1H-benzo[d]imidazole-2-carboxamide (see, U.S. provisional patent application No. 61/829,229).