The rapid, reliable, and cost-effective analysis of polymer molecules, such as sequencing of nucleic acids and polypeptides, is a major goal of researchers and medical practitioners. The ability to determine the sequence of polymers, such as a nucleic acid sequence in DNA or RNA or polypeptides, has additional importance in identifying genetic mutations and polymorphisms. Established DNA sequencing technologies have considerably improved in the past decade but still require substantial amounts of DNA and several lengthy steps and struggle to yield contiguous readlengths of greater than 100 nucleotides. This information must then be assembled “shotgun” style, an effort that depends non-linearly on the size of the genome and on the length of the fragments from which the full genome is constructed. These steps are expensive and time-consuming, especially when sequencing mammalian genomes.
Nanopore-based analysis methods have been investigated as an alternative to traditional polymer analysis approaches. These methods involve passing a polymeric molecule, for example single-stranded DNA (“ssDNA”), through a nanoscopic opening while monitoring a signal, such as an electrical signal, that is influenced by the physical properties of the polymer subunits as the polymer analyte passes through the nanopore opening. The nanopore optimally has a size or three-dimensional configuration that allows the polymer to pass only in a sequential, single file order. Under theoretically optimal conditions, the polymer molecule passes through the nanopore at a rate such that the passage of each discrete monomeric subunit of the polymer can be correlated with the monitored signal. Differences in the chemical and physical properties of each monomeric subunit that makes up the polymer, for example, the nucleotides that compose a ssDNA, result in characteristic electrical signals that can identify each monomeric subunit as it passes through the nanopore. Nanopores, such as for example, protein nanopores held within lipid bilayer membranes and solid state nanopores, which have been heretofore used for analysis of DNA, RNA, and polypeptides, thus provide the potential advantage of robust analysis of polymers even at low copy number.
However, challenges remain for the full realization of such benefits. For example, in ideal sequencing conditions, the passage of each potential monomeric subunit-type through the nanopore would cause a distinct detectable signal that can be readily differentiated from detectable signals caused by the passage of any other monomeric subunit-types through the nanopore. However, depending on the structural characteristics of the nanopore and the particular polymer analyte, multiple monomeric subunit types can often produce detectable signals that are difficult to distinguish. For example, in the analysis of ssDNA using a protein nanopore based on Mycobacterium smegmatis porin A (MspA), the nucleotides in the constricted portion of the pore have the most influence on the ion current that flows through the pore. When monitoring the ion current, it has been found that the nucleotide residue adenine (A) results in the largest detectable current, whereas the residue thymine (T) results in the lowest detectable current. While the A and T residues can be readily distinguished, the nucleotide residues cytosine (C) and guanine (G) cause current levels that are similarly between the current levels caused by A and T residues. Accordingly, C and G residues are often difficult to distinguish from each other. In another example, analysis of ssDNA in the protein pore α-hemolysin results in signals that are even more compressed where there is signal overlap for all four nucleotide residues types, which makes base-calling uncertain.
Accordingly, a need remains to facilitate production of consistent, clear, and distinguishable signals that can differentiate each potential subunit type of a polymer. The methods and compositions of the present disclosure address this and related needs of the art.