The present invention relates generally to matrices for conducting nucleic acid affinity chromatography. More specifically, the present invention relates to methods of preparing affinity chromatography matrices that bind a plurality of different preselected nucleic acids. The matrices, for example, can bind to substantially every known nucleic acid message in a sample.
Affinity chromatography has become a valuable tool for separating biological materials from fluid (typically aqueous) media. Examples include biologically active molecules such as small ligands, proteins, nucleic acids, enzymes, etc.
The basic principle of affinity chromatography involves immobilization of a binding moiety (e.g., a ligand) to an insoluble support. The immobilized binding moiety can then be used to selectively adsorb, e.g., from a fluid medium, the target component(s) (e.g. an enzyme) with which the binding moiety specifically interacts thereby forming a binding moiety/target complex. Elution of the adsorbed component can then be achieved by any one of a number of procedures which result in disassociation of the complex. Thus the specific biologic properties of biological macromolecules can be exploited for purification. The process can be used to isolate specific substances such as enzymes, hormones, specific proteins, inhibitors, antigens, antibodies, etc. on the basis of the biologic specific interactions with immobilized ligands.
Nucleic acid affinity chromatography is based on the tendency of complementary, single-stranded nucleic acids to form a double-stranded or duplex structure through complementary base pairing. A nucleic acid (either DNA or RNA) can easily be attached to a solid substrate (matrix) where it acts as an immobilized ligand that interacts with and forms duplexes with complementary nucleic acids present in a solution contacted to the immobilized ligand. Unbound components can be washed away from the bound complex to either provide a solution lacking the target molecules bound to the affinity column, or to provide the isolated target molecules themselves. The nucleic acids captured in a hybrid duplex can be separated and released from the affinity matrix by denaturation either through heat, adjustment of salt concentration, or the use of a destabilizing agent such as formamide, TWEEN(trademark)-20 denaturing agent, or sodium dodecyl sulfate (SDS).
Hybridization (the formation of duplex structure) between two nucleic acid sequences is highly sequence dependent. Sequences have the greatest affinity with each other where, for every purine in one sequence (nucleic acid) there exists a corresponding pyrimidine in the other nucleic acid and vice versa. This sequence dependency confers exquisite specificity on hybridization reactions and permits the preparation of affinity columns that are highly selective for particular target nucleic acids.
Affinity columns (matrices) are typically used either to isolate a single nucleic acid typically by providing a single species of affinity ligand. Alternatively, affinity columns bearing a single affinity ligand (e.g. oligo dt columns) have been used to isolate a multiplicity of nucleic acids where the nucleic acids all share a common sequence (e.g. a polyA).
This invention provides pools (solutions) of nucleic acids, and nucleic acid affinity matrices that bear a large number of different nucleic acid affinity ligands allowing the simultaneous selection and blocking or removal of a large number of different preselected nucleic acids from a sample. This invention additionally provides methods and devices for the preparation of such affinity matrices.
In one embodiment, this invention provides a method of making a nucleic acid pool (solution of nucleic acids) comprising a plurality of different nucleic acids. The method includes first, providing a nucleic acid amplification template array comprising a surface to which are attached at least 20 oligonucleotides having different predetermined (known) nucleic acid sequences; and second, amplifying the multiplicity of oligonucleotides at least about 10 fold to provide the nucleic acid pool. The oligoncleotides, or subsequences thereof, preferably encode xe2x80x9ccapture probesxe2x80x9d which can be incorported into an affinity matrix. In a preferred embodiment, each different oligonucleotide is localized in a predetermined region of the surface, the density of the oligonucleotides is preferably greater than about 60 different oligonucleotides per 1 cm2, and the different oligonucleotides preferably have an identical terminal 3xe2x80x2 nucleic acid subsequence and an identical terminal 5xe2x80x2 nucleic acid subsequence. The 3xe2x80x2 and 5xe2x80x2 nucleic acid subsequences can be the same as each other or can differ in length and/or nucleotide sequence. The 3xe2x80x2 and 5xe2x80x2 subsequences preferably flank xe2x80x9cuniquexe2x80x9d central subsequences encoding the capture probes.
The method can further involve attaching the pool of nucleic acids to a solid support to form a nucleic acid affinity matrix.
The template nucleic acids comprising the amplification template can be synthesized entirely using light-directed polymer synthesis or channel methods. Alternatively the template nucleic acids can be synthesized using a combination of methods. For example, in one embodiment, the 3xe2x80x2 segments (subsequences) of the template nucleic acids can be synthesized using standard phosphotriester (e.g., phosphoramidite) chemistry. A middle (unique) portion of the template nucleic acids can then be synthesized using light-directed polymer synthesis or mechanically-directed synthesis methods. Finally, the 5xe2x80x2 segments (subsequences) of the template nucleic acids can be synthesized using phosphotriester chemistry.
The template nucleic acids can be amplified using any nucleic acid amplification method (e.g. polymerase chain reaction, ligase chain reaction, transcription amplification, etc.). In a preferred embodiment, amplification is by PCR. The template nucleic acids can be released into solution prior to the amplification (e.g. by cleavage of a linker joining the template nucleic acids to the substrate) thereby allowing the amplification to be performed in solution. Alternatively, and in a preferred embodiment, the amplification is performed without releasing the template nucleic acids from the substrate.
In a preferred embodiment, the amplification templates include primer binding regions (e.g. 3xe2x80x20 and 5xe2x80x2 subsequences flanking the region encoding the capture probe). Preferred amplification templates include identical 3xe2x80x2 and 5xe2x80x2 primers. The primer binding regions of the amplification template oligonucleotides, and hence the corresponding complementary PCR primers, preferably range in length from about 4 to about 30 nucleotides. The primer binding regions can be identical to each other or can differ in nucleotide sequence and/or in length.
In a particularly preferred embodiment, the region of the amplification templates encoding the capture probes (the non-identical portion of the amplification template(s)) ranges in length from about 6 to about 50 nucleotides. Where it is desired to remove the primer binding regions, they can include a recognition site of a nuclease to facilitate cleavage. In a particularly preferred embodiment, the thermal melting points of the template nucleic acid sequences encoding the capture probes with their complementary sequences varies by less than about 20xc2x0 C.
In another embodiment, this invention provides for nucleic acid amplification template arrays for practice of the above-described method. In a preferred embodiment, the template arrays comprise a predetermined multiplicity of at least 20 oligonucleotides having different nucleic acid sequences. Each different oligonucleotide is preferably localized in a predetermined region of said surface. The density of the oligonucleotides is preferably greater than about 60 different oligonucleotides per 1 cm2, and the different oligonucleotides have identical terminal 3xe2x80x2 nucleic acid subsequences (e.g., primer binding region) and identical terminal 5xe2x80x2 nucleic acid subsequences (e.g., primer binding region). The 3xe2x80x2 and 5xe2x80x2 subsequences can be identical to each other or differ in length and/or nucleotide sequence. The subsequences (primer binding regions) of the oligonucleotides, and hence the corresponding complementary PCR primers, preferably range in length from about 4 to about 30 nucleotides.
The region of the template nucleic acids comprising the amplification template array encoding the capture probe (the xe2x80x9cuniquexe2x80x9d non-terminal subsequence) preferably ranges in length from about 6 to about 50 nucleotides. Where it is desired to remove the primer binding regions, the 3xe2x80x2 and/or 5xe2x80x2 subsequences can include a recognition site of a nuclease to facilitate cleavage. In a particularly preferred embodiment, the thermal melting points of the template nucleic acid sequences encoding the capture probes with their complementary sequences varies by less than about 20xc2x0 C.
In another embodiment this invention provides an affinity matrix that removes substantially all known nucleic acid messages in a sample and methods of making such an affinity matrix. In a preferred embodiment, the affinity matrix comprises a multiplicity of at least 20 different predetermined oligonucleotides where, for each nucleic acid message, there exists in the affinity matrix an oligonucleotide complementary to the nucleic acid message or a subsequence thereof. The matrix, however, does not include every possible oligonucleotide having the same length as the predetermined oligonucleotides. The oligonucleotides can be selected such that the affinity matrix includes fewer than 80% of the total number of possible nucleotides, preferably fewer than 60% of the total number of possible nucleotides, more preferably fewer than 40% to the total number of possible oligonucleotides, and most preferably less than about 30% or even 20% or even 10% or even 5 % of the total possible number of oligonucleotides having the same length as the predetermined oligonucleotides. Oligonucleotides comprising preferred nucleic acid matrices range in length from about 6 to about 50 nucleotides.
Oligonucleotides for inclusion in such affinity matrices can be selected as described herein by the steps of i) determining an allowable Tm interval, ii) determining a mismatch Tm threshold; iii) identifying all nucleic acid sequences complementary to a known message whose Tm to said message is within the allowable Tm interval; iv) determining the likelihood of each of the nucleic acid sequences complementary to the known message also occurring in an unknown message; v) sorting the sequences in order of likelihood with the least likely sequence first to produce a sorted sequence list; vi) selecting the first nucleic acid sequence in the list whose Tm to all other known messages in the sample is below the mismatch Tm; vii) repeating step vi) until a desired number of nucleic acids that specifically hybridize, under stringent conditions, to the known message are obtained; and viii) repeating steps iii) through vii) until at least one nucleic acid sequence that hybridizes specifically under stringent conditions to each known nucleic acid message is selected. Step (vi) can further comprise selecting the probe that additionally has a Tm to all already selected nucleic acids below the mismatch Tm.
In one embodiment the allowable Tm interval ranges from about 30xc2x0 C. to about 80xc2x0 C. In another preferred embodiment, the mismatch Tm is at least 5xc2x0 C. lower than the allowable Tm interval. The likelihood can be determined by calculating the probability of occurrence of each of the nucleic acid sequences of step (iii) in a calculated nucleic acid probability distribution. The oligonucleotides can be produced by amplification from a nucleic acid amplification template array as described above and further herein. Further details on the selection of oligonucleotides in the matrix are provided herein.
In still yet another embodiment, this invention provides a nucleic acid affinity matrix that binds to N previously unknown nucleic acid messages and methods of making such nucleic acid matrices. The method involves the steps of first providing a multiplicity of at least N different predetermined oligonucleotides each oligonucleotide complementary to an unknown nucleic acid message predicted to be present in a nucleic acid sample or complementary to a subsequence of the unknown nucleic acid message; and second, attaching the nucleic acids to a solid support. The oligonucleotides can be selected by: i) providing a list of all possible oligonucleotides of length K; ii) deleting from the list all of the oligonucleotides that hybridize to known nucleic acid messages; iii) calculating a probability of occurrence in a nucleic acid distribution of each of the probes remaining in the list; iv) sorting the list from highest probability to lowest probability; v) selecting the highest probability oligonucleotide for inclusion in the affinity matrix; and vi) repeating steps (iii) through (v) until N oligonucleotides are selected. The selection of step (vi)can further comprise recalculating the probability on the condition that probability distribution contains no nucleic acids complementary to those oligonucleotides already selected. Selection step (v) can further include selecting an allowable Tm interval and selecting the highest probability oligonucleotide whose Tm lies within the allowable Tm interval. The oligonucleotides can be amplified from the nucleic acid amplification template arrays described above. In a particularly preferred embodiment, the oligonucleotides are attached to a solid support (e.g. glass beads) by a covalent linkage to a biotin which is joined to a streptavidin which is covalently joined to the solid support.
Finally, in still yet another embodiment, this invention provides a method to enrich a nucleic acid sample for previously unknown expressed RNA sequences. The method includes the steps of: i) providing an affinity matrix having at least one oligonucleotide complementary to each known expressed RNA present in a sample; ii) hybridizing RNA from an undifferentiated control cell and differentiated or activated test cell respectively to the affinity matrix thereby removing known expressed RNAs from the control cell and the differentiated or activated test cell; iii) reverse transcribing the RNA from each of the control cell and the differentiated or activated test cell to produce a cDNA, wherein the reverse transcription adds a polymerase chain reaction primer binding region to the cDNAs from the differentiated or activated test cell; iv) combining the cDNAs from the differentiated or activated test cell with the cDNA from the control cell such that there is more cDNA from the control cell than cDNA from the differentiated or activated test cell; v) amplifying the mixture of cDNAs using primers complementary to the primer binding regions such that the amplification results in an enrichment of nucleic acid sequences transcribed in the differentiated or activated test cell at a significantly higher level than in the control cell. In a preferred embodiment, ratio of cDNA from the control cell to cDNA from the test cell, in step (iv) is at least about a 5:1, more preferably at least about 10:1, most preferably at least about 20:1.
Definitions
As used herein, an xe2x80x9coligonucleotidexe2x80x9d refers to a single stranded nucleic acid having a length greater than 2 nucleotides, more preferably greater than about 5 nucleotides, and most preferably greater than about 10, 15, 20, or 50 oligonucleotides. The oligonucleotides of this invention can range in length up to about 1000 nucleotides, but preferred lengths range up to a maximum of about 500, more preferably up to about 250 nucleotides, and most preferably up to about 150 nucleotides (bases). An oligonucleotide can include natural (i.e., a, G, C, T or U) or modified bases (i.e., 7-deazaguanosine, inosine, etc.). In addition, the bases in an oligonucleotide can be joined by a linkage other than a phosphodiester bond, so long as it does not interfere with hybridization of the oligonucleotide. Thus, oligonucleotides can be peptide nucleic acids in which one or more of the constituent bases are joined by peptide bonds rather than phosphodiester linkages.
The term nucleic acid xe2x80x9caffinity matrixxe2x80x9d, as used herein, refers to a solid support or gel to which is attached a multiplicity of different oligonucleotides. It is recognized that a nucleic acid template array, itself can act as an affinity matrix. However, in a preferred embodiment, where greater loading (binding) capacity is preferred, the affinity matrix is fabricated using nucleic acids amplified from the template array. Preferred matrix materials do not interfere with subsequent hybridization of attached oligonucleotides. Suitable matrix materials include, but are not limited to paper, glasses, ceramics, metals, metalloids, polacryloylmorpholide, various plastics and plastic copolymers such as Nylon(trademark), Teflon(trademark), polyethylene, polypropylene, poly(4-methylbutene), polystyrene, polystyrene, polystyrene/latex, polymethacrylate, poly(ethylene terephthalate), rayon, nylon, poly(vinyl butyrate), polyvinylidene difluoride (PVDF), silicones, polyformaldehyde, cellulose, cellulose acetate, nitrocellulose, and controlled-pore glass (Controlled Pore Glass, Inc., Fairfield, N.J.), aerogels (see, e.g., Ruben et al., J. Materials Science 27, 4341-4349 (1992); Rao et al., J. Material. Science 28, 3021 (1993); Back et al., J. Phys. D. Appl. Phys. 22, 730-734 (1989); Kim and Jang, J. Am. Ceram. Soc. 74, 1987-92 (1991) and the like, and other materials generally known to be suitable for use in affinity columns (e.g. HPLC columns).
The term xe2x80x9ctarget nucleic acidxe2x80x9d refers to a nucleic acid (often derived from a biological sample), to which the oligonucleotide probe is designed to specifically hybridize. It is the target nucleic acid(s) that the affinity matrices of this invention are designed to capture (bind). The target nucleic acid(s) have sequences that are complementary to the nucleic acid sequence of the oligonucleotide affinity ligand in the affinity matrix. The term target nucleic acid may refer to the specific subsequence of a larger nucleic acid to which oligonucleotide is complementary or to the overall sequence (e.g., gene, cDNA or mRNA) that it is desired to capture. The difference in usage will be apparent from context.
The term xe2x80x9csubsequencexe2x80x9d refers to a partial sequence of a longer nucleic acid.
The term xe2x80x9caffinity ligandxe2x80x9d as used herein refers to a molecule present in the affinity matrix that specifically binds to, and thereby captures, a target molecule. Oligonucleotides are preferred affinity ligands in the affinity matrices of this invention.
The terms xe2x80x9cnucleic acid templatexe2x80x9d or xe2x80x9ctemplatexe2x80x9d, as used herein, refer to a nucleic acid that acts as a template for a nucleic acid amplification method. Nucleic acid templates of the present invention serve as templates for the amplification of nucleic acid pools comprising capture probes that are used either in solution or bound to a solid support to provide nucleic acid affinity matrices. Preferred nucleic acid templates additionally include primer binding regions to facilitate amplification. A particularly preferred nucleic acid template comprises a unique sequence (subsequence) that encodes the nucleic acid capture probe, flanked on the 5xe2x80x2 and 3xe2x80x2 ends by subsequences that act as primer binding regions.
The term xe2x80x9cnucleic acid poolxe2x80x9d as used herein, refers to a heterogenous collection of nucleic acids. For example, a nucleic acid pool can comprises at least 100, 1000, or 10,000 different nucleic acids. The nucleic acids within a pool often lack an imposed relationship. For example, a pool can be formed from nucleic acids lacking substantial sequence identity with each other (e.g., less than 50% or 75% sequence identity) to each other. Sequence identity is determined between optimally aligned sequences by standard algorithms such as GAP, BESTFIT, FASTA, and TFASTA (Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wis.). Nucleic acids within the pool typically range in size from 5-100 bases, preferably, 10-50 bases. Typically the nucleic acid pools are prepared by amplification of a heterogenous collection of template nucleic acids (e.g., as found in a template array).
The term xe2x80x9cblocking reagentxe2x80x9d, when used herein in reference to a nucleic acid pool, refers to a pool or solution of one or more nucleic acids that specifically bind to preselected target sequences. The duplexes thus formed are typically incapable of further hybridization.
The term xe2x80x9ctemplate arrayxe2x80x9d or xe2x80x9camplification template arrayxe2x80x9d refers to a collection of oligonucleotides that acts as a templates for simultaneous amplification of a collection of nucleic acids. Preferred template arrays are used in the fabrication of affinity ligands for incorporation into an affinity matrix.
The terms xe2x80x9cnucleic acidxe2x80x9d or xe2x80x9cnucleic acid moleculexe2x80x9d refer to a deoxyribonucleotide or ribonucleotide polymer in either single-or double-stranded form, and unless otherwise limited, would encompass known analogs of natural nucleotides that can function in a similar manner as naturally occurring nucleotides.
The phrase xe2x80x9cnucleic acid messagexe2x80x9d, as used herein refers to a nucleic acid or subsequence thereof that is transcribed when a gene is activated. Thus, nucleic acid messages typically include mRNAs and subsequences thereof. However, nucleic acid messages are used herein to refer to nucleic acids indicative of the presence, absence, or amount of such transcribed sequences. Thus, nucleic acid messages also include nucleic acids derived from such transcripts including, but not limited to cDNA, cRNA, amplification products, and so forth.
The phrase xe2x80x9chybridizing specifically toxe2x80x9d, refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. The term xe2x80x9cstringent conditionsxe2x80x9d refers to conditions under which a probe will hybridize to its target subsequence, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence (or about 5xc2x0 C. lower than the sequence with the highest melting point for a group of sequences) at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic acid concentration) at which half the duplex molecules (i.e. half the base pairs) are dissociated, or the point where the denaturation rate equals the renaturation rate under given conditions. Typically, stringent conditions will be those in which the salt concentration is less than about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30xc2x0 C. for short probes (e.g., 16 to 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
The term xe2x80x9ccapture probexe2x80x9d, as used herein, refers to a nucleic acid that is complementary to a target nucleic acid. The capture probe, when incorporated into an affinity matrix acts as an affinity ligand that can specifically hybridize to and thereby capture its respective target nucleic acid. It is recognized that capture probes can also exist in solution (e.g. in nucleic acid pools) where they may act as blocking probes or where they can be subsequently bound to a solid support to produce an affinity matrix.