Identification and/or sequencing of biomolecules, such as nucleic acids or proteins, is essential for medical diagnostics, forensics, toxicology, pathology, biological warfare, public health and numerous other fields. Although a great deal of research is presently directed towards identification and/or sequencing of nucleic acids or proteins, other biomolecules such as carbohydrates, polysaccharides, lipids, fatty acids, etc. may be of importance. The methods, compositions and apparatus disclosed herein are not limited to identification and/or sequencing of nucleic acids, but are also of use for analysis of other types of biomolecules, including but not limited to proteins, lipids and polysaccharides.
Standard methods for nucleic acid detection, such as Southern blotting or binding to nucleic acid chips, rely on hybridization of a fluorescent or radioactive probe molecule with a target nucleic acid molecule. Known methods for nucleic acid sequencing typically utilize either the Sanger dideoxy technique or hybridization to nucleic acid chips.
Oligonucleotide hybridization based assays are in wide use for detection of target nucleic acids. A probe oligonucleotide that is complementary in sequence to a target nucleic acid is attached to a fluorescent, radioactive or other moiety and allowed to hybridize to a nucleic acid through Watson-Crick base pair formation. Many variations on this technique are known. More recently, DNA chips have been designed that can contain hundreds or even thousands of oligonucleotide probes. Hybridization of a target nucleic acid to an oligonucleotide on a chip may be detected using fluorescence spectroscopy, radioactivity, etc. Problems with sensitivity and/or specificity may result from nucleic acid hybridization between sequences that are not precisely complementary. The presence of low levels of a target nucleic acid in a sample may not be detected.
Methods for Sanger dideoxy nucleic acid sequencing, based on detection of four-color fluorescent or radioactive nucleic acids that have been separated by size, are limited by the length of the nucleic acid that can be sequenced. Typically, only 500 to 1,000 bases of nucleic acid sequence can be determined at one time. Using current methods, determination of a complete gene sequence requires that many copies of the gene be produced, cut into overlapping fragments and sequenced, after which the overlapping DNA sequences may be assembled. This process is laborious, expensive, inefficient and time-consuming. It also typically requires the use of fluorescent or radioactive moieties, which can potentially pose safety and waste disposal problems. More recent methods for nucleic acid sequencing using hybridization to oligonucleotide chips may be used to infer short nucleic acid sequences or to detect the presence of a specific nucleic acid in a sample, but are not suited for identifying long nucleic acid sequences.
A variety of techniques are available for identification of proteins, polypeptides and peptides. Commonly, these involve binding and detection of antibodies that can recognize one or more epitopic domains on the protein. Although antibody-based identification of proteins is fairly rapid, such assays may occasionally show unacceptably high levels of false positive or false negative results, due to cross-reactivity of the antibody with different antigens, low antigenicity of the target analyte (leading to low sensitivity of the assay), nonspecific binding of antibody to various surfaces, etc. They also require the preparation of antibodies that can recognize an individual protein or peptide. As such, they are not suitable for the identification of novel proteins that have not previously been characterized.
A need exists for rapid, accurate and sensitive methods for detection, identification and/or sequencing of biomolecules, such as nucleic acids or proteins.