Rapid, reliable, and inexpensive characterization of polymers, particularly nucleic acids, has become increasingly important. One notable project, known as the Human Genome Project, has as its goal sequencing the entire human genome, which is over three billion nucleotides.
Typical current nucleic acid sequencing methods depend either on chemical reactions that yield multiple length DNA strands cleaved at specific bases, or on enzymatic reactions that yield multiple length DNA strands terminated at specific bases. In each of these methods, the resulting DNA strands of differing length are then separated from each other and identified in strand length order. The chemical or enzymatic reactions, as well as the technology for separating and identifying the different length strands, usually involve tedious, repetitive work. A method that reduces the time and effort required would represent a highly significant advance in biotechnology.
The invention relates to a method for rapid, easy characterization of individual polymer molecules, for example polymer size or sequence determination. Individual molecules in a population may be characterized in rapid succession.
Stated generally, the invention features a method for evaluating a polymer molecule which includes linearly connected (sequential) monomer residues. Two separate pools of a medium and an interface between the pools are provided. The interface between the pools is capable of interacting sequentially with the individual monomer residues of a single polymer present in one of the pools. Interface-dependent measurements are continued over time, as individual monomer residues of a single polymer interact sequentially with the interface, yielding data suitable to infer a monomer-dependent characteristic of the polymer. Several individual polymers, e.g., in a heterogenous mixture, can be characterized or evaluated in rapid succession, one polymer at a time, leading to characterization of the polymers in the mixture.
The method is broadly useful for characterizing polymers that are strands of monomers which, in general (if not entirely), are arranged in linear strands. The method is particularly useful for characterizing biological polymers such as deoxyribonucleic acids, ribonucleic acids, polypeptides, and oligosaccharides, although other polymers may be evaluated. In some embodiments, a polymer which carries one or more charges (e.g., nucleic acids, polypeptides) will facilitate implementation of the invention.
The monomer-dependent characterization achieved by the invention may include identifying physical characteristics such as the number and composition of monomers that make up each individual molecule, preferably in sequential order from any starting point within the polymer or its beginning or end. A heterogenous population of polymers may be characterized, providing a distribution of characteristics (such as size) within the population. Where the monomers within a given polymer molecule are heterogenous, the method can be used to determine their sequence.
The interface between the pools is designed to allow passage of the monomers of one polymer molecule in single file order, that is, one monomer at a time. As described in greater detail below, the useful portion of the interface may be a passage in or through an otherwise impermeable barrier, or it may be an interface between immiscible liquids.
The medium used in the invention may be any fluid that permits adequate polymer mobility for interface interaction. Typically, the medium will be liquids, usually aqueous solutions or other liquids or solutions in which the polymers can be distributed. When an electrically conductive medium is used, it can be any medium which is able to carry electrical current. Such solutions generally contain ions as the current conducting agents, e.g., sodium, potassium, chloride, calcium, cesium, barium, sulfate, or phosphate. Conductance across the pore or channel is determined by measuring the flow of current across the pore or channel via the conducting medium. A voltage difference can be imposed across the barrier between the pools by conventional means. Alternatively, an electrochemical gradient may be established by a difference in the ionic composition of the two pools of medium, either with different ions in each pool, or different concentrations of at least one of the ions in the solutions or media of the pools. In this embodiment of the invention, conductance changes are measured and are indicative of monomer-dependent characteristics.
The term xe2x80x9cion permeable passagesxe2x80x9d used in this embodiment of the invention includes ion channels, ion-permeable pores, and other ion-permeable passages, and all are used herein to include any local site of transport through an otherwise impermeable barrier. For example, the term includes naturally occurring, recombinant, or mutant proteins which permit the passage of ions under conditions where ions are present in the medium contacting the channel or pore. Synthetic pores are also included in the definition. Examples of such pores can include, but are not limited to, chemical pores formed, e.g., by nystatin, ionophores, or mechanical perforations of a membranous material. Proteinaceous ion channels can be voltage-gated or voltage independent, including mechanically gated channels (e.g., stretch-activated K+ channels), or recombinant engineered or mutated voltage dependent channels (e.g., Na+ or K+ channels constructed as is known in the art).
Another type of channel is a protein which includes a portion of a bacteriophage receptor which is capable of binding all or part of a bacteriophage ligand (either a natural or functional ligand) and transporting bacteriophage DNA from one side of the interface to the other. The polymer to be characterized includes a portion which acts as a specific ligand for the bacteriophage receptor, so that it may be injected across the barrier/interface from one pool to the other.
The protein channels or pores of the invention can include those translated from one or more natural and/or recombinant DNA molecule(s) which includes a first DNA which encodes a channel or pore forming protein and a second DNA which encodes a monomer-interacting portion of a monomer polymerizing agent (e.g., a nucleic acid polymerase or exonuclease). The expressed protein or proteins are capable of non-covalent association or covalent linkage (any linkage herein referred to as forming an xe2x80x9cassemblagexe2x80x9d of xe2x80x9cheterologous unitsxe2x80x9d), and when so associated or linked, the polymerizing portion of the protein structure is able to polymerize monomers from a template polymer, close enough to the channel forming portion of the protein structure to measurably affect ion conductance across the channel. Alternatively, assemblages can be formed from unlike molecules, e.g., a chemical pore linked to a protein polymerase; these assemblages fall under the definition of a xe2x80x9cheterologousxe2x80x9d assemblage.
The invention also includes the recombinant fusion protein(s) translated from the recombinant DNA molecule(s) described above, so that a fusion protein is formed which includes a channel forming protein linked as described above to a monomer-interacting portion of a nucleic acid polymerase. Preferably, the nucleic acid polymerase portion of the recombinant fusion protein is capable of catalyzing polymerization of nucleotides. Preferably, the nucleic acid polymerase is a DNA or RNA polymerase, more preferably T7 RNA polymerase.
The polymer being characterized may remain in its original pool, or it may cross the passage. Either way, as a given polymer molecule moves in relation to the passage, individual monomers interact sequentially with the elements of the interface to induce a change in the conductance of the passage. The passages can be traversed either by polymer transport through the central opening of the passage so that the polymer passes from one of the pools into the other, or by the polymer traversing across the opening of the passage without crossing into the other pool. In the latter situation, the polymer is close enough to the channel for its monomers to interact with the passage and bring about the conductance changes which are indicative of polymer characteristics. The polymer can be induced to interact with or traverse the pore, e.g., as described below, by a polymerase or other template-dependent polymer replicating catalyst linked to the pore which draws the polymer across the surface of the pore as it synthesizes a new polymer from the template polymer, or by a polymerase in the opposite pool which pulls the polymer through the passage as it synthesizes a new polymer from the template polymer. In such an embodiment, the polymer replicating catalyst is physically linked to the ion-permeable passage, and at least one of the conducting pools contains monomers suitable to be catalytically linked in the presence of the catalyst. A xe2x80x9cpolymer replicating catalyst,xe2x80x9d xe2x80x9cpolymerizing agentxe2x80x9d or xe2x80x9cpolymerizing catalystxe2x80x9d is an agent that can catalytically assemble monomers into a polymer in a template dependent fashionxe2x80x94i.e., in a manner that uses the polymer molecule originally provided as a template for reproducing that molecule from a pool of suitable monomers. Such agents include, but are not limited to, nucleotide polymerases of any type, e.g., DNA polymerases, RNA polymerases, tRNA and ribosomes.
The characteristics of the polymer can be identified by the amplitude or duration of individual conductance changes across the passage. Such changes can identify the monomers in sequence, as each monomer will have a characteristic conductance change signature. For instance, the volume, shape, or charges on each monomer will affect conductance in a characteristic way. Likewise, the size of the entire polymer can be determined by observing the length of time (duration) that monomer-dependent conductance changes occur. Alternatively, the number of monomers in a polymer (also a measure of size) can be determined as a function of the number of monomer-dependent conductance changes for a given polymer traversing a passage. The number of monomers may not correspond exactly to the number of conductance changes, because there may be more than one conductance level change as each monomer of the polymer passes sequentially through the channel. However, there will be a proportional relationship between the two values which can be determined by preparing a standard with a polymer of known sequence.
The mixture of polymers used in the invention does not need to be homogenous. Even when the mixture is heterogenous, only one molecule interacts with a passage at a time, yielding a size distribution of molecules in the mixture, and/or sequence data for multiple polymer molecules in the mixture.
In other embodiments, the channel is a natural or recombinant bacterial porin molecule that is relatively insensitive to an applied voltage and does not gate. Preferred channels for use in the invention include the xcex1-hemolysin toxin from S. aureus and maltoporin channels.
In other preferred embodiments, the channel is a natural or recombinant voltage-sensitive or voltage gated ion channel, preferably one which does not inactivate (whether naturally or through recombinant engineering as is known in the art). xe2x80x9cVoltage sensitivexe2x80x9d or xe2x80x9cgatedxe2x80x9d indicates that the channel displays activation and/or inactivation properties when exposed to a particular range of voltages.
In an alternative embodiment of the invention, the pools of medium are not necessarily conductive, but are of different compositions so that the liquid of one pool is not miscible in the liquid of the other pool, and the interface is the immiscible surface between the pools. In order to measure the characteristics of the polymer, a polymer molecule is drawn through the interface of the liquids, resulting in an interaction between each sequential monomer of the polymer and the interface. The sequence of interactions as the monomers of the polymer are drawn through the interface is measured, yielding information about the sequence of monomers that characterize the polymer. The measurement of the interactions can be by a detector that measures the deflection of the interface (caused by each monomer passing through the interface) using reflected or refracted light, or a sensitive gauge capable of measuring intermolecular forces. Several methods are available for measurement of forces between macromolecules and interfacial assemblies, including the surface forces apparatus (Israelachvili, Intermolecular and Surface Forces, Academic Press, New York, 1992), optical tweezers (Ashkin et al., Oppt. Lett., 11:288, 1986; Kuo and Sheetz, Science, 260:232, 1993; Svoboda et al., Nature 365:721, 1993), and atomic force microscopy (Quate, F. Surf. Sci. 299:980, 1994; Mate et al., Phys. Rev. Lett. 59:1942, 1987; Frisbie et al., Science 265:71, 1994; all hereby incorporated by reference).
The interactions between the interface and the monomers in the polymer are suitable to identify the size of the polymer, e.g., by measuring the length of time during which the polymer interacts with the interface as it is drawn across the interface at a known rate, or by measuring some feature of the interaction (such as deflection of the interface, as described above) as each monomer of the polymer is sequentially drawn across the interface. The interactions can also be sufficient to ascertain the identity of individual monomers in the polymer.
The invention further features a method for sequencing a nucleic acid polymer, which can be double stranded or single stranded, by (1) providing two separate, adjacent pools of a medium and an interface (e.g., a lipid bilayer) between the two pools, the interface having a channel (e.g., bacterial porin molecules) so dimensioned as to allow sequential monomer-by-monomer passage from one pool to another of only one nucleic acid polymer at a time; (2) placing the nucleic acid polymer to be sequenced in one of the two pools; and (3) taking measurements (e.g., ionic flow measurements, including measuring duration or amplitude of ionic flow blockage) as each of the nucleotide monomers of the nucleic acid polymer passes through the channel, so as to determine the sequence of the nucleotides in the nucleic acid polymer. The interface can include more than one channel in this method. In some cases, the nucleic acid polymer can interact with an inner surface of the channel. The sequencing of a nucleic acid, as used herein, is not limited to identifying specific nucleotide monomers, but can include distinguishing one type of monomer from another type of monomer (e.g., purines from pyrimidines), or distinguish one polymer from another polymer, where the two polymers differ in their nucleotide sequence.
The invention also features a method for detecting a single-stranded or double-stranded region in a nucleic acid by (1) providing two separate, adjacent pools of a medium and an interface (e.g., a lipid bilayer) between the two pools, the interface having a channel (e.g., a bacterial porin molecule) so dimensioned as to readily allow sequential monomer-by-monomer passage of a single-stranded nucleic acid, but not of a double-stranded nucleic acid, from one pool to another; (2) placing the nucleic acid to be sequenced in one of the two pools; and (3) taking measurements (e.g., ionic flow measurements, including measuring duration or magnitude of ionic flow blockage) as each of the nucleotide monomers of the single-stranded nucleic acid polymer passes through the channel so as to differentiate between nucleotide monomers that are hybridized to another nucleotide monomer before entering the channel and nucleotide monomers that are not hybridized to another nucleotide monomer before entering the channel. The interface can include more than one channel in this method. In some cases, the nucleic acid polymer can interact with an inner surface of the channel. The double-stranded region detected can be intermolecular (i.e., hybridization between two nucleic acid molecules) or intramolecular (i.e., hybridization between portions of the same nucleic acid). In addition, the method can be facilitated by varying the applied voltage across the interface, e.g., between the predetermined voltages of 120 mV and 240 mV.
The method described immediately above is especially useful for detecting hybridization, or lack thereof, of a probe to a target nucleic acid that differs from the sequence of the probe by only one nucleotide. In other words, the method can be used to detect single nucleotide alternations or mutations in the target by detecting hybridization of a probe to a target, such measurements being able to distinguish between a sequence that is exactly complementary to a probe (or a portion of the target). To facilitate this level of sensitivity, the temperature of the two pools can be set to lie half-way between the Tm of perfectly complementary probe and target and the Tm of the imperfectly complementary probe and target (e.g., between about 26xc2x0 C. to 30xc2x0 C. [see FIG. 12]) to achieve the necessary level of performance. Consequently, the invention also includes a method for evaluating a polymer (e.g., a nucleic acid) by (1) providing two separate pools of a medium and a interface between the two pools; (2) placing a first and second polymer in one of the two pools; (3) taking a first interface-dependent measurement over time at a first temperature as individual monomer residues of the first polymer interacts with the interface, yielding data suitable to determine a monomer-dependent characteristic of the polymer molecule; (4) adjusting the temperature of at least one of the two pools to a second temperature; and (5) taking a second interface-dependent measurement over time at the second temperature as individual monomer residues of the second polymer interacts with the interface, yielding data suitable to determine a monomer-dependent characteristic of the polymer molecule. In addition, the first and second interface-dependent measurements can be compared. When taking the second interface-dependent measurement, the polymer interacting with the interface can be the same molecule (i.e., have the same chemical structure) from which the first interface-dependent measurement was taken, or a different molecule (i.e., having a different chemical structure).
The two pools can contain an electrically conductive medium (e.g., an aqueous solution), in which case a voltage can be optionally applied across the interface to facilitate movement of the nucleic acid polymer through the channel and the taking of measurements. Such measurements are interface-dependent, i.e., the measurements are spatially or temporally related to the interface. For example, ionic measurements can be taken when the polymer traverses an internal limiting (in size or conductance) aperture of the channel. In this case, the flow of ions through the channel, and especially through the limiting aperture of the channel, is affected by the size or charge of the polymer and the inside surface of the channel. These measurements are spatially related to the interface because one measures the ionic flow through the interface as specific monomers pass a specific portion (the limiting aperture) of the interface channel.
To maximize the signal to noise ratio when ionic flow measurements are taken, the interface surface area facing a chamber is preferably less than 0.02 mm2. In general, the interface containing the channels should have a design which minimizes the total access resistance to less than 20% of the theoretical (calculated) minimal convergence resistance. The total access resistance is the sum of the resistance contributed by the electrode/electrolyte interface, salt bridges, and the medium in the channel. The resistance of the medium in the channel includes the bulk resistance, the convergence resistance at each end of the channel, and the intra-channel resistance.
In addition, measurements can be temporally related to the interface, such as when a measurement is taken at a pre-determined time or range of times before or after each monomer passes into or out of the channel.
As an alternative to voltage, a nucleic acid polymerase or exonuclease can be provided in one of the chambers to draw the nucleic acid polymer through the channel as discussed below.
This invention offers advantages in nucleotide sequencing, e.g., reduced number of sequencing steps, higher speed of sequencing, and increased length of the polymer to be sequenced. The speed of the method and the size of the polymers it can sequence are particular advantages of the invention. The linear polymer may be very large, and this advantage will be especially useful in reducing template preparation time, sequencing errors and analysis time currently needed to piece together small overlapping fragments of a large gene or stretch of polymer.
Other features and advantages of the invention will be apparent from the following description of the preferred embodiments thereof, and from the claims.