The invention relates to nucleic acid pools useful in hybridization studies. In particular, the invention relates to methods for preparing such nucleic acid pools, arrays and kits produced from such pools, and an improved method for identifying a nucleic acid and/or its representation in a sample.
The rate of determining the sequence of the four nucleotides in nucleic acid samples is a major technical obstacle for further advancement of molecular biology, medicine, and biotechnology. Nucleic acid sequencing methods which involve separation of nucleic acid molecules in a gel have been in use since 1978. The other proven method for sequencing nucleic acids is sequencing by hybridization (SBH).
The traditional method of determining a sequence of nucleotides (i.e., the order of the A, G, C and T nucleotides in a sample) is performed by preparing a mixture of randomly-terminated, differentially labelled nucleic acid fragments by degradation at specific nucleotides, or by dideoxy chain termination of replicating strands. Resulting nucleic acid fragments in the range of 1 to 500 bp are then separated on a gel to produce a ladder of bands wherein the adjacent samples differ in length by one nucleotide.
The array-based approach of SBH does not require single base resolution in separation, degradation, synthesis or imaging of a nucleic acid molecule. Using mismatch discriminative hybridization of short oligonucleotides K bases in length, lists of constituent K-mer oligonucleotides may be determined for target nucleic acid. Sequence for the target nucleic acid may be assembled by uniquely overlapping scored oligonucleotides.
There are several approaches available to achieve sequencing by hybridization. In a process called SBH Format 1, nucleic acid samples are arrayed, and labeled probes are hybridized with the samples. Replica membranes with the same sets of sample nucleic acids may be used for parallel scoring of several probes and/or probes may be multiplexed. Nucleic acid samples may be arrayed and hybridized on nylon membranes or other suitable supports. Each membrane array may be reused many times. Format 1 is especially efficient for batch processing large numbers of samples.
In SBH Format 2, probes are arrayed at locations on a substrate which correspond to their respective sequences, and a labelled nucleic acid sample fragment is hybridized to the arrayed probes. In this case, sequence information about a fragment may be determined in a simultaneous hybridization reaction with all of the arrayed probes. For sequencing other nucleic acid fragments, the same oligonucleotide array may be reused. The arrays may be produced by spotting or by in situ synthesis of probes.
In Format 3 SBH, two sets of probes are used. In one embodiment, a set may be in the form of arrays of probes with known positions, and another, labelled set may be stored in multiwell plates. In this case, target nucleic acid need not be labelled. Target nucleic acid and one or more labelled probes are added to the arrayed sets of probes. If one attached probe and one labelled probe both hybridize contiguously on the target nucleic acid, they are covalently ligated, producing a detected sequence equal to the sum of the length of the ligated probes. The process allows for sequencing long nucleic acid fragments, e.g. a complete bacterial genome, without nucleic acid subcloning in smaller pieces.
In the present invention, SBH is applied to the efficient identification and sequencing of one or more nucleic acid samples. The procedure has many applications in nucleic acid diagnostics, forensics, and gene mapping. It also may be used to identify mutations responsible for genetic disorders and other traits, to assess biodiversity and to produce many other types of data dependent on nucleic acid sequence.
The present invention provides a method for detecting a target nucleic acid species including the steps of providing an array of probes affixed to a substrate and a plurality of labeled probes wherein each labeled probe is selected to have a first nucleic acid sequence which is complementary to a first portion of a target nucleic acid and wherein the nucleic acid sequence of at least one probe affixed to the substrate is complementary to a second portion of the nucleic acid sequence of the target, the second portion being adjacent to the first portion; applying a target nucleic acid to the array under suitable conditions for hybridization of probe sequences to complementary sequences; introducing a labeled probe to the array; hybridizing a probe affixed to the substrate to the target nucleic acid; hybridizing the labeled probe to the target nucleic acid; affixing the labeled probe to an adjacently hybridized probe in the array; and detecting the labeled probe affixed to the probe in the array. According to preferred methods of the invention the array of probes affixed to the substrate comprises a universal set of probes. According to other preferred aspects of the invention at least two of the probes affixed to the substrate define overlapping sequences of the target nucleic acid sequence and more preferably at least two of the labelled probes define overlapping sequences of the target nucleic acid sequences. Still further, according to another aspect of the invention a method is provided for detecting a target nucleic acid of known sequence comprising the steps of: contacting a nucleic acid sample with a set of immobilized oligonucleotide probes attached to a solid substrate under hybridizing conditions wherein the immobilized probes are capable of specific hybridization with different portions of said target nucleic acid sequence; contacting the target nucleic acid with a set of labelled oligonucleotide probes in solution under hybridizing conditions wherein the labeled probes are capable of specific hybridization with different portions of said target nucleic acid sequence adjacent to the immobilized probes; covalently joining the immobilized probes to labelled probes that are immediately adjacent to the immobilized probe on the target sequence (e.g., with ligase); removing any non-ligated labelled probes; detecting the presence of the target nucleic acid by detecting the presence of said labelled probe attached to the immobilized probes. The invention also provides a method of determining expression of a member of a set of partially or completely sequenced genes in a cell type, a tissue or a tissue mixture comprising the steps of: defining pairs of fixed and labeled probes specific for the sequenced gene; hybridizing unlabeled nucleic acid sample and corresponding labeled probes to one or more arrays of fixed probes; forming covalent bonds between adjacent hybridized labeled and fixed probes; removing unligated probes; and determining the presence of the sequenced gene by detection of labeled probes bound to prespecified locations in the array. In a preferred embodiment of this aspect of the invention, the target nucleic acid will identify the presence of an infectious agent.
Further, the present invention provides for an array of oligonucleotide probes comprising a nylon membrane; a plurality of subarrays of oligonucleotide probes on the nylon membrane, the subarrays comprising a plurality of individual spots wherein each spot is comprised of a plurality of oligonucleotide probes of the same sequence; and a plurality of hydrophobic barriers located between the subarrays on the nylon membrane, whereby the plurality of hyydrophobic barriers prevents cross contamination between adjacent subarrays.
Still further, the present invention provides a method for sequencing a repetitive sequence, having a first end and a second end, in a target nucleic acid comprising the steps of: (a) providing a plurality of spacer oligonucleotides of varying lengths wherein the spacer oligonucleotides comprise the repetitive sequence; (b) providing a first oligonucleotide that is known to be adjacent to the first end of the repetitive sequence; (c) providing a plurality of second oligonucleotides one of which is adjacent to the second end of the repetitive sequence, wherein the plurality of second oligonucleotides is labeled; (d) hybridizing the first and the plurality of second oligonucleotides, and one of the plurality of spacer oligonucleotides to the target nucleic acid; (e) ligating the hybridized oligonucleotides; (f) separating ligated oligonucleotides from unligated oligonucleotides; and (g) detecting label in the ligated oligonucleotides.
Still further, the present invention provides a method for sequencing a branch point sequence, having a first end and a second end, in a target nucleic acid comprising the steps of: (a) providing a first oligonucleotide that is complementary to a first portion of the branch point sequence wherein the first oligonucleotide extends from the first end of the branch point sequence by at least one nucleotide; (b) providing a plurality of second oligonucleotides that are labeled, and are complementary to a second portion of the branch point sequence wherein the plurality of second oligonucleotides extend from the second end of the branch point sequence by at least one nucleotide, and wherein the portion of the second oligonucleotides that extend from the second end of the branch point sequence comprise sequences that are complementary to a plurality of sequences that arise from the branch point sequence; (c) hybridizing the first oligonucleotide, and one of the plurality of second oligonucleotides to the target DNA; (d) ligating the hybridized oligonucleotides; (e) separating ligated oligonucleotides from unligated oligonucleotides; and (f) detecting label in the ligated oligonucleotides.
Still further, the present invention provides a method for confirming a sequence by using probes that are predicted to be negative for the target nucleic acid. The sequence of a target is then confirmed by hybridizing the target nucleic acid to the xe2x80x9cnegativexe2x80x9d probes to confirm that these probes do not form perfect matches with the target nucleic acid.
Still further, the present invention provides a method for analyzing a nucleic acid using oligonucleotide probes that are complexed with different labels so that the probes may be multiplexed in a hybridization reaction without a loss of sequence information (i.e., different probes have different labels so that hybridization of the different probes to the target can be distinguished). In a preferred embodiment, the labels are radioisotopes, or floursecent molecules, or enzymes, or electrophore mass labels. In a more preferred embodiment, the differently labeled oligonucleotides probes are used in format III SBH, and multiple probes (more than two, with one probe being the immobilized probe) are ligated together.
Still further, the present invention provides a method for detecting the presence of a target nucleic acid having a known sequence when the target is present in very small amounts compared to homologous nucleic acids in a sample. In a preferred embodiment, the target nucleic acid is an allele present at very low frequency in a sample that has nucleic acids from a large number of sources. In an alternative preferred embodiment, the target nucleic acid has a mutated sequence, and is present at very low frequency within a sample of nucleic acids.
Still further, the present invention provides a method for confirming the sequence of a target nucleic acid by using single pass gel sequencing. Primers for single pass gel sequencing are derived from the sequence obtained by SBH, and these primers are used in standard Sanger sequencing reactions to provide gel sequence information for the target nucleic acid. The sequence obtained by single pass gel sequencing is then compared to the SBH derived sequence to confirm the sequence.
Still further, the present invention provides a method for solving branch points by using single pass gel sequencing. Primers for the single pass gel sequencing reactions are identified from the ends of the Sfs obtained after a first round of SBH sequencing, and these primers are used in standard Sanger-sequencing reactions to provide gel sequencing information through the branch points of the Sfs. Sfs are then aligned by comparing the Sanger-sequencing results through the branch points to the Sfs to identify adjoining Sfs.
Still further, the present invention provides for a method of preparing a sample containing target nucleic acids by PCR, without purifying the PCR products prior to the SBH reactions. In Format I SBH, crude PCR products are applied to a substrate without prior purification, and the substrate may be washed prior to introduction of the labeled probes.
Still further, the present invention provides a method and an apparatus for analyzing a target nucleic acid. The apparatus comprises two arrays of nucleic acids that are mixed together at the desired time. In a preferred embodiment, the nucleic acids in one of the arrays are labeled. In a more preferred embodiment, a material is disposed between the two arrays and this material prevents the mixing of nucleic acids in the arrays. When this material is removed, or rendered permeable, the nucleic acids in the two arrays are mixed together. In an alternative preferred embodiment, the nucleic acids in one array are target nucleic acids and the nucleic acids in the other are oligonucleotide probes. In another preferred embodiment, the nucleic acids in both arrays are oligonucleotide probes. In another preferred embodiment, the nucleic acids in one array are oligonucleotide probes and target nucleic acids, and nucleic acids in the other array are oligonucleotide probes. In another preferred embodiment, the nucleic acids in both arrays are oligonucleotide probes and target nucleic acids.
One method of the present invention using the apparatus described above comprises the steps of providing an array of nucleic acids fixed to a substrate, providing a second array of nucleic acids, providing conditions that allow the nucleic acids in the second array to come into contact with the nucleic acids of the fixed array wherein one of the arrays of nucleic acids are target nucleic acids and the other array is oligonucleotide probes, and analyzing the hybridization results. In a preferred embodiment, the fixed array is target nucleic acid and the second array is labeled oligonucleotide probes. In a more preferred embodiment, there is a material disposed between the two arrays that prevents mixing of the nucleic acids until the material is removed or rendered permeable to the nucleic acids.
In a second method of the present invention using the apparatus described above comprises the steps of providing two arrays of nucleic acid probes, providing conditions that allow the two arrays of probes to come into contact with each other and a target nucleic acid, ligating together probes that are adjacent on the target nucleic acid, and analyzing the results. In a preferred embodiment, the probes in one array are fixed and the probes in the other array are labeled. In a more preferred embodiment, there is a material disposed between the two arrays that prevents mixing of the probes until the material is removed or rendered permeable to the probes.
Still further, the present invention provides substrates on which arrays of oligonucleotide probes are fixed, wherein each probe is separated from its neighboring probes by a physical barrier that is resistant to the flow of the sample solution. In a preferred embodiment, the physical barrier is made of a hydrophobic material.
Still further, the present invention provides a method for making the arrays of oligonucleotide probes that are separated by physical barriers. In a preferred embodiment, a grid is applied to the substrate using an ink-jet head that applies a material which reduces the reaction volume of the array.
Still further, the present invention provides substrates on which oligonucleotides are fixed to form a three-dimensional array. The three-dimensional array combines high resolution for reading probe results (each level has a relatively low density of probes per cm2), with high information content in three dimensional space (multiple levels or probes).
Still further, the present invention provides a substrate to which oligonucleotide probes are fixed, wherein the oligonucleotide probes have spacers, and wherein the spacers increase the distance between the substrate and the informational portion of the oligonucleotide probe (e.g., the portion of the oligonucleotide probe which binds to the target and gives sequence information). In a preferred embodiment, the spacer comprises ribose sugars and phosphates, wherein the phosphates covalently bind the ribose sugars into a polymer by forming esters with the ribose sugars through their 5xe2x80x2 and 3xe2x80x2 hydroxyl groups.
Still further, the present invention provides a method for clustering cDNA clones into groups of similar or identical sequences, so that single representative clones may be selected from each group for sequencing. In a preferred embodiment, the method for clustering is used in the sequencing of a plurality of clones, comprising the steps of: interrogating each clone with a plurality of oligonucleotide probes; determining which probes bind to each clone and the signal intensity for each probe; clustering clones into a plurality of groups by identifying clones that bind to similar probes with similar intensities; and sequencing at least one clone from each group. In a more preferred embodiment, the plurality of probes comprises from about 50 to about 500 different probes. In a another more preferred embodiment, the plurality of probe comprises about 300 different probes. In a most preferred embodiment, the plurality of clones are a plurality of cDNA clones.
Still further, the invention relates to oligonucleotide probes complexed (covalent or noncovalent) to discrete particles wherein the particles can be grouped into a plurality of sets based on a physical property. In a preferred embodiment, a different probe is attached to the discrete particles of each set, and the identity of the probe is determined by identifying the physical property of the discrete particles. In an alternative embodiment, the probe is identified on the basis of a physical property of the probe. The physical property includes any that can be used to differentiate the discrete particles, and includes, for example, size, flourescence, radioactivity, electromagnetic charge, or absorbance, or label(s) may be attached to the particle such as a dye, a radionuclide, or an EML. In a preferred embodiment, discrete particles are separated by a flow cytometer which detects the size, charge, flourescence, or absorbance of the particle.
The invention also relates to methods using the probes complexed with the discrete particles to analyze target nucleic acids. These probes may be used in any of the methods described above, with the modification of identifying the probe by the physical property of the discrete particle. These probes may also be used in a format III approach where the xe2x80x9cfreexe2x80x9d probe is identified by a label, and the probe complexed to the discrete particle is identified by the physical property. In a preferred embodiment, the probes are used to sequence a target nucleic acid using SBH.
The invention also relates to methods using agents which destabilize the binding of complementary polynucleotide strands (decrease the binding energy), or increase stability of binding between complementary polynucleotide strands (increase the binding energy). In preferred embodiments, the agent is a tetraalkyl ammonium salt, sodium chloride, a phosphate salt, a borate salt, an organic solvent such as formamide, glycol, dimethylsulfoxide, and dimethylformamide, urea, guanidinium, an amino acid analog such as betaine, a polyamine such as spermidine and spermine, or other positively charged molecules which neutralize the negative charge of the phosphate backbone, a detergent such as sodium dodecyl sulfate, and sodium lauryl sarcosinate, a minor/major groove binding agent, a positively charged polypeptide, an intercalating agent such as acridine, ethidium bromide, and anthracine, and a polyanion such as an alkyl polysulphonic acid. In a preferred embodiment, an agent is used to reduce or increase the Tm of a pair of complementary polynucleotides. In a more preferred embodiment, a mixture of the agents is used to reduce or increase the Tm of a pair of complementary polynucleotides. In a preferred embodiment, the agent or agents are added so that the binding energy from an AT base pair is approximately equivalent to the binding energy of a GC base pair. The energy of binding of these complementary polynucleotides may be increased by adding an agent that neutralizes or shields the negative charges of the phosphate groups in the polynucleotide backbone. In a most preferred embodiment, the agent or agents are used to enhance the discrimination of discrimination of perfect matches from mismatches for complementary polynucleotides.
The invention also relates to methods of increasing the discrimination of perfect matches from mismatches for complementary polynucleotides. In preferred embodiments, this discrimination is increased by changing a physical property in the method, e.g., the temperature, and/or adding an agent which increases discrimination, e.g., spermadine or formamide. In a more preferred embodiment, a mixture of agents and/or physical conditions is used to increase the discrimination of perfect matches from mismatches between a probe and a target nucleic acid. In a most preferred embodiment, the change in physical condition or addition of an agent enhances discrimination in a number of ways, for example, the physical condition or agent may increase the difference in the on rates or off rates between a perfect match product and a mismatch product (a kinetic effect); or the reaction time may be decreased so that binding of the probe to a perfect match site and/or a mismatch site does not reach equilibrium; or the physical condition or agent may increase the binding energy difference between a perfect match and a mismatch (a free energy [xcex94G] effect); or the physical condition or agent may enhance the discrimination effect of another agent or physical condition (xcex94G or kinetic effect); or the physical condition or agent may preferentially modify the perfect match or mismatch complexes formed between complementary polynucleotides; or the physical condition or agent may enhance the discrimination of the physical condition or agent which physically modifies the complexed polynucleotides (xcex94G, kinetic, or conformational effect); or some combination of these and other factors. In a preferred embodiment, the agent, agents or physical condition(s) modify the activity of a protein which binds to and/or modifies the complexed or uncomplexed nucleic acids. In a preferred embodiment, the agent is one of those recited supra. In a preferred embodiment, the physical condition is selected from the group comprising temperature, pH, ionic strength, time, and/or others such as, e.g., those listed in The Handbook of Chemistry and Physics, CRC Press.
The invention also relates to methods for enhancing the activity of a nucleic acid modifying polypeptide on a target nucleic acid, comprising the steps of contacting the target nucleic acid with at least one polynucleotide under conditions which allow a perfect match to be discriminated from a mismatch, wherein an agent is added to enhance the discrimination of the perfect match from the mismatch; and contacting the complex formed between the polynucleotide and the target nucleic acid with the nucleic acid modifying polypeptide, wherein the activity of the nucleic acid modifying polypeptide is enhanced by the enhanced discrimination. In preferred embodiments, the nucleic acid modifying polypeptide is selected from the group comprising a ligase, a nucleic acid polymerase, an integrase, a gyrase, a nuclease, a helicase, a methylase, and a capping enzyme. In an alternative preferred embodiment, the methods are used to enhance the binding of nucleic acid binding proteins, such as, for example, transcription factors, repressors, and structural polypeptides such as, for example, histones. In a most preferred embodiment, the nucleic acid modifying polypeptide is a ligase that has been modified to enhance its discrimination of perfect matches from mismatches.
In one embodiment, the invention includes methods for preparing nucleic acid pools useful in the hybridization studies described herein. This embodiment allows hybridization conditions, such as time, temperature, ionic strength, etc., to be adjusted to increase the likelihood that hybridization to the nucleic acids within each pool is within the linear range of detection (i.e., detectable but not saturating).
The methods of this embodiment rely on pooling nucleic acids derived from a sample, based on the degree of representation within the sample, i.e., nucleic acids having similar degrees of representation within in a sample are combined into a pool. As used herein, the term xe2x80x9ccombinedxe2x80x9d can refer to physical mixing of nucleic acids, but also encompasses the classification of nucleic acids as belonging to a particular pool without physical mixing of nucleic acids. Two, three, four, five, or more pools can be produced from each sample. Conveniently, three pools are produced: one pool containing nucleic acids having xe2x80x9clowxe2x80x9d representation, one pool containing nucleic acids having xe2x80x9cintermediatexe2x80x9d representation, and one pool containing nucleic acids having xe2x80x9chighxe2x80x9d representation. The terms xe2x80x9clow,xe2x80x9d xe2x80x9cintermediate,xe2x80x9d and xe2x80x9chighxe2x80x9d are used in this context to define relationships among the pools, rather than to refer to absolute degrees of representation. In other words, xe2x80x9chigh representationxe2x80x9d refers to a higher degree of representation than xe2x80x9cintermediate representation,xe2x80x9d and those skilled in the art understand that what constitutes high representation can vary from sample to sample.
The nucleic acid pools can be prepared from any nucleic acids, including genomic DNA, DNA produced by amplification, cDNA, and RNA. Samples from which the nucleic acids are typically derived include, for example, a tissue, cell (eukaryotic or prokaryotic), and nucleic acid library (e.g., a genomic or cDNA library). Nucleic acid pools are conveniently prepared from cDNA, such as a cDNA library. In this instance, cDNA clones can be pooled based on the degree of representation in the cDNA library. Such pools can then be used in hybridization studies. Nucleic acid pools according to the invention are particularly useful for determining the degree of representation of one or more target nucleic acids in a sample. For example, pools of cDNA clones can be employed in an expression monitoring study where RNA or cDNA from a tissue or cell of interest is contacted with the pooled cDNAs to determine the presence and expression levels of the RNAs in the tissue or cell.
Thus, the invention also provides an improved method for identifying a nucleic acid and/or its representation in a sample. In this method, the nucleic acids in each pool can be contacted with one or more target nucleic acids and/or oligonucleotide probes under conditions suitable for hybridization and hybridization detected. Suitable hybridization conditions are different for each pool. Conveniently, a shorter hybridization time is used when hybridizing nucleic acids from a xe2x80x9chigh representationxe2x80x9d pool relative to the hybridization time for nucleic acids from a lower representation pool. Alternatively (or in addition), one or more factors affecting the rate of association of nucleic acid strands can be adjusted to help ensure linearity.
Hybridization can be carried out in any of a variety of formats. In particular, the invention includes a method in which nucleic acids subjected to pooling (hereafter xe2x80x9cpooled nucleic acidsxe2x80x9d) are affixed to one or more substrates and hybridized with soluble target nucleic acid(s) and/or oligonucleotide probe(s) (i.e., sequencing by hybridization [SBH] Format 1), a method in which target nucleic acid(s) and/or oligonucleotide probe(s) are affixed to one or more substrates and hybridized with the pooled nucleic acids (i.e., SBH Format 2), and a method in which both the pooled nucleic acids and the target nucleic acid(s) and/or oligonucleotide probe(s) are in solution. The invention further encompasses methods in which pooled nucleic acids are employed in SBH Format 3 hybridization studies in which the pooled nucleic acids are either affixed to one or more substrates or are in solution. In all of these formats, where the pooled nucleic acids are in solution, they are generally labeled.
In methods wherein the pooled nucleic acids are affixed to one or more substrates, the nucleic acids can conveniently be arrayed such that, for each dot in the array, the corresponding pool is known, and thus the degree of representation of the nucleic acid in the dot is known. Hybridization conditions are then adjusted to facilitate hybridization with nucleic acids of a given pool, and hybridization at all dots corresponding to that pool is then determined. In a variation of this embodiment, each pool of representative nucleic acids is arrayed on a substrate to form a separate array of nucleic acids. This step produces multiple arrays, each containing nucleic acids having a degree of representation in the sample that is within a predetermined range, wherein the range differs for each array so produced. The arrays produced from nucleic acids derived from a given sample can be combined to form kits. Thus, in addition to methods, the present invention provides arrays and kits produced from pooled nucleic acids.
Another aspect of the invention is a method in which nucleic acids are selected from a sample and assigned to pools based on an SBH study in which nucleic acids are clustered into groups having the same or similar nucleotide sequences. More specifically, nucleic acids from a sample are contacted with a plurality of oligonucleotide probes under suitable conditions for hybridization of oligonucleotide probes to nucleic acids. Nucleic acids that bind to the same sets of oligonucleotide probes are identified and clustered into a plurality of groups. Each such group includes nucleic acids that share a common xe2x80x9coligonucleotide probe signaturexe2x80x9d and therefore is expected to include nucleic acids having the same or similar nucleotide sequences. The number of nucleic acids in each group is determined as an indication of their degree of representation in the sample. A representative nucleic acid is selected from each group to obtain a series of representative nucleic acids, which are then combined into a plurality of pools based on degree of representation in the sample.
In another variation of this embodiment, the degree of representation in a sample is determined by any conventional method, such as, for example, intensity of hybridization signal in a hybridization study.
Format 1 SBH is appropriate for the simultaneous analysis of a large set of samples. Parallel scoring of thousands of samples on large arrays may be performed in thousands of independent hybridization reactions using small pieces of membranes. The identification of DNA may involve 1-20 probes per reaction and the identification of mutations may in some cases involve more than 1000 probes specifically selected or designed for each sample. For identification of the nature of the mutated DNA segments, specific probes may be synthesized or selected for each mutation detected in the first round of hybridizations.
DNA samples may be prepared in small arrays which may be separated by appropriate spacers, and which may be simultaneously tested with probes selected from a set of oligonucleotides which may be arrayed in multiwell plates. Small arrays may consist of one or more samples. DNA samples in each small array may include mutants or individual samples of a sequence. Consecutive small arrays may be organized into larger arrays. Such larger arrays may include replication of the same small array or may include arrays of samples of different DNA fragments. A universal set of probes includes sufficient probes to analyze a DNA fragment with prespecified precision, e.g. with respect to the redundancy of reading each base pair (xe2x80x9cbpxe2x80x9d). These sets may include more probes than are necessary for one specific fragment, but may include fewer probes than are necessary for testing thousands of DNA samples of different sequence.
DNA or allele identification and a diagnostic sequencing process may include the steps of:
1) Selection of a subset of probes from a dedicated, representative or universal set to be hybridized with each of a plurality of small arrays;
2) Adding a first probe to each subarray on each of the arrays to be analyzed in parallel;
3) Performing hybridization and scoring of the hybridization results;
4) Stripping off previously used probes;
5) Repeating hybridization, scoring and stripping steps for the remaining probes which are to be scored;
5) Processing the obtained results to obtain a final analysis or to determine additional probes to be hybridized;
6) Performing additional hybridizations for certain subarrays; and
7) Processing complete sets of data and obtaining a final analysis.
This approach provides fast identification and sequencing of a small number of nucleic acid samples of one type (e.g. DNA, RNA), and also provides parallel analysis of many sample types in the form of subarrays by using a presynthesized set of probes of manageable size. Two approaches have been combined to produce an efficient and versatile process for the determination of DNA identity, for DNA diagnostics, and for identification of mutations.
For the identification of known sequences, a small set of shorter probes may be used in place of a longer unique probe. In this approach, although there may be more probes to be scored, a universal set of probes may be synthesized to cover any type of sequence. For example, a full set of 6-mers includes only 4,096 probes, and a complete set of 7-mers includes only 16,384 probes.
Full sequencing of a DNA fragment may be performed with two levels of hybridization. One level is hybridization of a sufficient set of probes that cover every base at least once. For this purpose, a specific set of probes may be synthesized for a standard sample. The results of hybridization with such a set of probes reveal whether and where mutations (differences) occur in non-standard samples. Further, this set of probes may include xe2x80x9cnegativexe2x80x9d probes to confirm the hybridization results of the xe2x80x9cpositivexe2x80x9d probes. To determine the identity of the changes, additional specific probes may be hybridized to the sample. This additional set of probes will have both xe2x80x9cpositivexe2x80x9d (the mutant sequence) and xe2x80x9cnegativexe2x80x9d probes, and the sequence changes will be identified by the positive probes and confirmed by the negative probes.
In another embodiment, all probes from a universal set may be scored. A universal set of probes allows scoring of a relatively small number of probes per sample in a two step process without an undesirable expenditure of time. The hybridization process may involve successive probings, in a first step of computing an optimal subset of probes to be hybridized first and, then, on the basis of the obtained results, a second step of determining additional probes to be scored from among those in a universal set. Both sets of probes have xe2x80x9cnegativexe2x80x9d probes that confirm the positive probes in the set. Further, the sequence that is obtained may then be confirmed in a separate step by hybridizing the sample with a set of xe2x80x9cnegativexe2x80x9d probes identified from the SBH results.
In SBH sequence assembly, K-1 oligonucleotides which occur repeatedly in analyzed DNA fragments due to chance or biological reasons may be subject to special consideration. If there is no additional information, relatively small fragments of DNA may be fully assembled in as much as every base pair is read several times.
In the assembly of relatively longer fragments, ambiguities may arise due to the repeated occurrence in a set of positively-scored probes of a K-1 sequence (i.e., a sequence shorter than the length of the probe). This problem does not exist if mutated or similar sequences have to be determined (i.e., the K-1 sequence is not identically repeated). Knowledge of one sequence may be used as a template to correctly assemble a sequence known to be similar (e.g. by its presence in a database) by arraying the positive probes for the unknown sequence to display the best fit on the template.
The use of an array of sample avoids consecutive scoring of many oligonucleotides on a single sample or on a small set of samples. This approach allows the scoring of more probes in parallel by manipulation of only one physical object. Subarrays of DNA samples 1000 bp in length may be sequenced in a relatively short period of time. If the samples are spotted at 50 subarrays in an array and the array is reprobed 10 times, 500 probes may be scored. In screening for the occurrence of a mutation, enough probes may be used to cover each base three times. If a mutation is present, several covering probes will be affected. The use of information about the identity of negative probes may map the mutation with a two base precision. To solve a single base mutation mapped in this way, an additional 15 probes may be employed. These probes cover any base combination for two questionable positions (assuming that deletions and insertions are not involved). These probes may be scored in one cycle on 50 subarrays which contain a given sample. In the implementation of a multiple label color scheme (i.e., multiplexing), two to six probes, each having a different label such as a different fluorescent dye, may be used as a pool, thereby reducing the number of hybridization cycles and shortening the sequencing process.
In more complicated cases, there may be two close mutations or insertions. They may be handled with more probes. For example, a three base insertion may be solved with 64 probes. The most complicated cases may be approached by several steps of hybridization, and the selecting of a new set of probes on the basis of results of previous hybridizations.
If subarrays to be analyzed include tens or hundreds of samples of one type, then several of them may be found to contain one or more changes (mutations, insertions, or deletions). For each segment where mutation occurs, a specific set of probes may be scored. The total number of probes to be scored for a type of sample may be several hundreds. The scoring of replica arrays in parallel facilitates scoring of hundreds of probes in a relatively small number of cycles. In addition, compatible probes may be pooled. Positive hybridizations may be assigned to the probes selected to check particular DNA segments because these segments usually differ in 75% of their constituent bases.
By using a larger set of longer probes, longer targets may be analyzed. These targets may represent pools of fragments such as pools of exon clones.
A specific hybridization scoring method may be employed to define the presence of mutants in a genomic segment to be sequenced from a diploid chromosomal set. Two variations are where: i) the sequence from one chromosome represents a known allele and the sequence from the other represents a new mutant; or, ii) both chromosomes contain new, but different mutants. In both cases, the scanning step designed to map changes gives a maximal signal difference of two-fold at the mutant position. Further, the method can be used to identify which alleles of a gene are carried by an individual and whether the individual is homozygous or heterozygous for that gene.
Scoring two-fold signal differences required in the first case may be achieved efficiently by comparing corresponding signals with homozygous and heterozygous controls. This approach allows determination of a relative reduction in the hybridization signal for each particular probe in a given sample. This is significant because hybridization efficiency may vary more than two-fold for a particular probe hybridized with different nucleic acid fragments having the same full match target. In addition, different mutant sites may affect more than one probe depending upon the number of oligonucleotide probes. Decrease of the signal for two to four consecutive probes produces a more significant indication of a mutant site. Results may be checked by testing with small sets of selected probes among which one or few probes selected to give a full match signal which is on average eight-fold stronger than the signals coming from mismatch-containing duplexes.
Partitioned membranes allow a very flexible organization of experiments to accommodate relatively larger numbers of samples representing a given sequence type, or many different types of samples represented with relatively small numbers of samples. A range of 4-256 samples can be handled with particular efficiency. Subarrays within this range of numbers of dots may be designed to match the configuration and size of standard multiwell plates used for storing and labeling oligonucleotides. The size of the subarrays may be adjusted for different number of samples, or a few standard subarray sizes may be used. If all samples of a type do not fit in one subarray, additional subarrays or membranes may be used and processed with the same probes. In addition, by adjusting the number of replicas for each subarray, the time for completion of identification or sequencing process may be varied.
As used herein, xe2x80x9cintermediate fragmentxe2x80x9d means an oligonucleotide between 5 and 1000 bases in length, and preferably between 10 and 40 bp in length.
In Format 3, a first set of oligonucleotide probes of known sequence is immobilized on a solid support under conditions which permit them to hybridize with nucleic acids having respectively complementary sequences. A labeled, second set of oligonucleotide probes is provided in solution. Both within the sets and between the sets the probes may be of the same length or of different lengths. A nucleic acid to be sequenced or intermediate fragments thereof may be applied to the first set of probes in double-stranded form (especially where a recA protein is present to permit hybridization under non-denaturing conditions), or in single-stranded form and under conditions which permit hybrids of different degrees of complementarity (for example, under conditions which allow discrimination between full match and one base pair mismatch hybrids). The nucleic acid to be sequenced or intermediate fragments thereof may be applied to the first set of probes before, after or simultaneously with the second set of probes. Probes that bind to adjacent sites on the target are bound together (e.g., by stacking interactions or by a ligase or other means of causing chemical bond formation between the adjacent probes). After permitting adjacent probes to be bound, fragments and probes which are not immobilized to the surface by chemical bonding to a member of the first set of probe are washed away, for example, using a high temperature (up to 100 degrees C.) wash solution which melts hybrids. The bound probes from the second set may then be detected using means appropriate to the label employed (which may, for example, be chemiluminescent, fluorescent, radioactive, enzymatic, densitometric, or electrophore mass labels).
Herein, nucleotide bases xe2x80x9cmatchxe2x80x9d or are xe2x80x9ccomplementaryxe2x80x9d if they form a stable duplex by hydrogen bonding under specified conditions. For example, under conditions commonly employed in hybridization assays, adenine (xe2x80x9cAxe2x80x9d) matches thymine (xe2x80x9cTxe2x80x9d), but not guanine (xe2x80x9cGxe2x80x9d) or cytosine (xe2x80x9cCxe2x80x9d). Similarly, G matches C, but not A or T. Other bases which will hydrogen bond in less specific fashion, such as inosine or the Universal Base (xe2x80x9cMxe2x80x9d base, Nichols et al 1994), or other modified bases, such as methylated bases, for example, are complementary to those bases for which they form a stable duplex under specified conditions. A probe is said to be xe2x80x9cperfectly complementaryxe2x80x9d or is said to be a xe2x80x9cperfect matchxe2x80x9d if each base in the probe forms a duplex by hydrogen bonding to a base in the nucleic acid to be sequenced according to the Watson and Crick base paring rules (i.e., absent any surrounding sequence effects, the duplex formed has the maximal binding energy for a particular probe). xe2x80x9cPerfectly complementaryxe2x80x9d and xe2x80x9cperfect matchxe2x80x9d are also meant to encompass probes which have analogs or modified nucleotides. A xe2x80x9cperfect matchxe2x80x9d for an analog or modified nucleotide is judged according to a xe2x80x9cperfect match rulexe2x80x9d selected for that analog or modified nucleotide (e.g., the binding pair that has maximal binding energy for a particular analog or modified nucleotide). Each base in a probe that does not form a binding pair according to the xe2x80x9crulesxe2x80x9d is said to be a xe2x80x9cmismatchxe2x80x9d under the specified hybridization conditions.
A list of probes may be assembled wherein each probe is a perfect match to the nucleic acid to be sequenced. The probes on this list may then be analyzed to order them in maximal overlap fashion. Such ordering may be accomplished by comparing a first probe to each of the other probes on the list to determine which probe has a 3xe2x80x2 end which has the longest sequence of bases identical to the sequence of bases at the 5xe2x80x2 end of a second probe. The first and second probes may then be overlapped, and the process may be repeated by comparing the 5xe2x80x2 end of the second probe to the 3xe2x80x2 end of all of the remaining probes and by comparing the 3xe2x80x2 end of the first probe with the 5xe2x80x2 end of all of the remaining probes. The process may be continued until there are no probes on the list which have not been overlapped with other probes. Alternatively, more than one probe may be selected from the list of positive probes, and more than one set of overlapped probes (xe2x80x9csequence nucleusxe2x80x9d) may be generated in parallel. The list of probes for either such process of sequence assembly may be the list of all probes which are perfectly complementary to the nucleic acid to be sequenced or may be any subset thereof.
The 5xe2x80x2 and 3xe2x80x2 ends of the probes may be overlapped to generate longer stretches of sequence. This process of assembling probes continues until an ambiguity arises because of a branch point (a probe is repeated in the fragment), repetitive sequences longer than the probes, or an uncloned segment. The stretches of sequence between any two ambiguities are referred to as fragment of a subclone sequence (Sfs). Where ambiguities arise in sequence assembly due to the availability of alternative proper overlaps with probes, hybridization with longer probes spanning the site of overlap alternatives, competitive hybridization, ligation of alternative end to end pairs of probes spanning the site of ambiguity or single pass gel analysis (to provide an unambiguous ordering of Sfs) may be used.
By employing the above procedures, one may obtain any desired level of sequence, from a pattern of hybridization (which may be correlated with the identity of a nucleic acid sample to serve as a signature for identifying the nucleic acid sample) to overlapping or non-overlapping probes up through assembled Sfs and on to complete sequence for an intermediate fragment or an entire source DNA molecule (e.g. a chromosome).
Sequencing may generally comprise the following steps:
(a) contacting an array of immobilized oligonucleotide probes with a nucleic acid fragment under conditions effective to allow the fragment to form a primary complex with an immobilized probe having a complementary sequence;
(b) contacting this primary complex with a set of labeled oligonucleotide probes in solution under conditions effective to allow the primary complex to hybridize to the labeled probe, thereby forming secondary complexes wherein the fragment is hybridized with both an immobilized probe and a labeled probe;
(c) removing from a secondary complex any labeled probe that has not hybridized adjacent to an immobilized probe;
(d) detecting the presence of adjacent labeled and unlabeled probes by detecting the presence of the label; and
(e) determining a nucleotide sequence of the fragment by connecting the known sequence of the immobilized and labeled probes.
Hybridization and washing conditions may be selected to detect substantially perfect match hybrids (such as those wherein the fragment and probe hybridize at six out of seven positions), may be selected to allow differentiation of perfect matches and one base pair mismatches, or may be selected to permit detection only of perfect match hybrids.
Suitable hybridization conditions may be routinely determined by optimization procedures or pilot studies. Such procedures and studies are routinely conducted by those skilled in the art to establish protocols for use in a laboratory. See e.g., Ausubel et al., Current Protocols in Molecular Biology, Vol. 1-2, John Wiley and Sons (1989); Sambrook et al., Molecular Cloning A Laboratory Manual, 2nd Ed., Vols. 1-3, Cold Springs Harbor Press (1989); and Maniatis et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Cold Spring Harbor, N.Y. (1982), all of which are incorporated by reference herein. For example, conditions such as temperature, concentration of components, hybridization and washing times, buffer components, and their pH and ionic strength may be varied.
In embodiments wherein the labeled and immobilized probes are not physically or chemically linked, detection may rely solely on washing steps of controlled stringency. Under such conditions, adjacent probes have increased binding affinity because of stacking interactions between the adjacent probes. Conditions may be varied to optimize the process as described above.
In embodiments wherein the immobilized and labeled probes are ligated, ligation may be implemented by a chemical ligating agent (e.g. water-soluble carbodiimide or cyanogen bromide), or a ligase enzyme, such as the commercially available T4 DNA ligase may be employed. The washing conditions may be selected to distinguish between adjacent versus nonadjacent labeled and immobilized probes exploiting the difference in stability for adjacent probes versus nonadjacent probes.
Oligonucleotide probes may be labeled with fluorescent dyes, chemiluminescent systems, radioactive labels (e.g., 35S, 3H, 32P or 33P) or with isotopes detectable by mass spectrometry.
Where a nucleic acid molecule of unknown sequence is longer than about 45 or 50 bp, the molecule may be fragmented and the sequences of the fragments determined. Fragmentation may be accomplished by restriction enzyme digestion, shearing or NaOH. Fragments may be separated by size (e.g. by gel electrophoresis) to obtain a preferred fragment length of about ten to forty bps.
Oligonucleotides may be immobilized, by a number of methods known to those skilled in the art, such as laser-activated photodeprotection attachment through a phosphate group using reagents such as a nucleoside phosphoramidite or a nucleoside hydrogen phosphorate. Glass, nylon, silicon and fluorocarbon supports may be used.
In a preferred embodiment, oligonucleotides are attached to a glass surface using a modified protocol from Zehn Gao et al., Nucl. Acids. Res. (1994) 22:5456-5465. In this protocol, the glass surface is activated by adding an amino-silane functional group, that is coupled with a phenyldiisothiocyanate (DITC). 5xe2x80x2-amino oligonucleotides are attached to this glass substrate by spotting onto the DITC activated glass surface and incubating for one hour at 37xc2x0 C. in a humid chamber.
Oligonucleotides may be organized into arrays, and these arrays may include all or a subset of all probes of a given length, or sets of probes of selected lengths. Hydrophobic partitions may be used to separate probes or subarrays of probes. Arrays may be designed for various applications (e.g. mapping, partial sequencing, sequencing of targeted regions for diagnostic purposes, mRNA sequencing and large scale sequencing). A specific chip may be designed to be dedicated to a particular application by selecting a combination and arrangement of probes on a substrate.
For example, 1024 immobilized probe arrays of all oligonucleotide probes 5 bases in length (each array containing 1024 distinct probes) may be constructed. The probes in this example are 5-mers in an informational sense (they may actually be longer probes). A second set of 1024 5-mer probes may be labeled, and one of each labeled probe may be applied to an array of immobilized probes along with a fragment to be sequenced. In this example, 1024 arrays would be combined in a large superarray, or xe2x80x9csuperchip.xe2x80x9d In those instances where an immobilized probe and one of the labeled probes hybridize end-to-end along a nucleic acid fragment, the two probes are joined, for example by ligation, and, after removing unbound label, 10-mers complementary to the sample fragment are detected by the correlation of the presence of a label at a point in an array having an immobilized probe of known sequence to which was applied a labeled probe of known sequence. The sequence of the sample fragment is simply the sequence of the immobilized probe continued in the sequence of the labeled probe. In this way, all one million possible 10-mers may be tested by a combinatorial process which employs only 5-mers and which thus involves one thousandth of the amount of effort for oligonucleotide synthesis.
In a preferred embodiment, the substrate which supports the array of oligonucleotide probes is partitioned into sections so that each probe in the array is separated from adjacent probes by a physical barrier which may be, for example, a hydrophobic material. In a preferred embodiment, the physical barrier has a width of from 100 xcexcm to 30 xcexcm. In a more preferred embodiment, the distance from the center of each probe to the center of any adjacent probes is 325 xcexcm. These arrays of probes may be xe2x80x9cmass-producedxe2x80x9d using a nonmoving, fixed substrate or a substrate fixed to a rotating drum or plate with an ink-jet deposition apparatus, for example, a microdrop dosing head; and a suitable robotic system, for example, an anorad gantry.
In an alternative preferred embodiment, the oligonucleotide probes are fixed to a three-dimensional array. The three-dimensional array is comprised of multiple layers, and each layer may be analyzed separate and apart from the other layers. The three dimensional array may take a number of forms, including, for example, the array may be disposed on a substrate having multiple depressions with probes located at different depths within the depressions (each level is made up of probes at similar depths within the depression); or the array may be disposed on a substrate having depressions of different depths with the probes located at the bottom of the depression, or at the peaks separating the depressions or some combination of peaks and depressions may be used (each level is made up of all the probes at a certain depth); or the array may be disposed on a substrate comprised of multiple sheets that are layered to form a three-dimensional array.
The probes in these arrays may include spacers that increase the distance between the surface of the substrate and the informational portion of the probes. The spacers may be comprised of atoms capable of forming at least two covalent bonds such as carbon, silicon, oxygen, sulfur, phosphorous, and the like, or may be comprised of molecules capable of forming at least two covalent bonds such as sugar-phosphate groups, amino acids, peptides, nucleosides, nucleotides, sugars, carbohydrates, aromatic rings, hydrocarbon rings, linear and branched hydrocarbons, and the like.
A nucleic acid sample to be sequenced may be fragmented or otherwise treated (for example, by the use of recA) to avoid hindrance to hybridization from secondary structure in the sample. The sample may be fragmented by, for example, digestion with a restriction enzyme such as Cvi JI, physical shearing (e.g. by ultrasound ), or by NaOH treatment. The resulting fragments may be separated by gel electrophoresis and fragments of an appropriate length, such as between about 10 bp and about 40 bp, may be extracted from the gel. In a preferred embodiment, the xe2x80x9cfragmentsxe2x80x9d of the nucleic acid sample cannot be ligated to other fragments in the pool. Such a pool of fragments may be obtained by treating the fragmented nucleic acids with a phosphatase (e.g., calf intestinal phosphatase). Alternatively, nonligatable fragments of the sample nucleic acid may be obtained by using random primers (e.g., N5-N9, where N=A, G, T, or C) in a Sanger-dideoxy sequencing reaction with the sample nucleic acid. This will produce fragments of DNA that have a complementary sequence to the target nucleic acid and that are terminated in a dideoxy residue that cannot be ligated to other fragments.
A reusable Format 3 SBH array may be produced by introducing a cleavable bond between the fixed and labeled probes and then cleaving this bond after a round of Format 3 analyzes is finished. The labeled probes may be ribonucleotides or a ribonucleotide may be used as the joining base in the labeled probe so that this probe may subsequently be removed, e.g., by RNAse or uracil-DNA glycosylate treatment, or NaOH treatment. In addition, bonds produced by chemical ligation may be selectively cleaved.
Other variations include the use of modified oligonucleotides to increase specificity or efficiency, cycling hybridizations to increase the hybridization signal, for example by performing a hybridization cycle under conditions (e.g. temperature) optimally selected for a first set of labeled probes followed by hybridization under conditions optimally selected for a second set of labeled probes. Shifts in reading frame may be determined by using mixtures (preferably mixtures of equimolar amounts) of probes ending in each of the four nucleotide bases A, T, C and G.
Branch points produce ambiguities as to the ordered sequence of a fragment. Although the sequence information is determined by SBH, either: (i) long read length, single-pass gel sequencing at a fraction of the cost of complete gel sequencing; or (ii) comparison to related sequences, may be used to order hybridization data where such ambiguities (xe2x80x9cbranch pointsxe2x80x9d) occur. Primers for single pass gel sequencing through the branch points are identified from the SBH sequence information or from known vector sequences, e.g., the flanking sequences to the vector insert site, and standard Sanger-sequencing reactions are performed on the sample nucleic acid. The sequence obtained from this single pass gel sequencing is compared to the Sfs that read into and out of the branch points to identify the order of the Sfs. Alternatively, the Sfs may be ordered by comparing the sequence of the Sfs to related sequences and ordering the Sfs to produce a sequence that is closest to the related sequence.
In addition, the number of tandem repetitive nucleic acid segments in a target fragment may be determined by single-pass gel sequencing. As tandem repeats occur rarely in protein-encoding portions of a gene, the gel-sequencing step will be performed only when one of these noncoding regions is identified as being of particular interest (e.g., if it is an important regulatory region).
Obtaining information about the degree of hybridization exhibited for a set of only about 200 oligonucleotides probes (about 5% of the effort required for complete sequencing) defines a unique signature of each gene and may be used for sorting the cDNAs from a library to determine if the library contains multiple copies of the same gene. By such signatures, identical, similar and different cDNAs can be distinguished and inventoried.
Nucleic acids and methods for isolating, cloning and sequencing nucleic acids are well known to those of skill in the art. See e.g., Ausubel et al., Current Protocols in Molecular Biology, Vol. 1-2, John Wiley and Sons (1989); and Sambrook et al., Molecular Cloning A Laboratory Manual, 2nd Ed., Vols. 1-3, Cold Springs Harbor Press (1989), both of which are incorporated by reference herein.
SBH is a well developed technology that may be practiced by a number of methods known to those skilled in the art. Specifically, techniques related to sequencing by hybridization of the following documents is incorporated by reference herein: Drmanac et al., U.S. Pat. No. 5,202,231 (hereby incorporated by reference herein)xe2x80x94Issued Apr. 13, 1993; Drmanac et al., Genomics, 4, 114-128 (1989); Drmanac et al., Proceedings of the First Int""l. Conf. Electrophoresis Supercomputing Human Genome Cantor et al. eds, World Scientific Pub. Co., Singapore, 47-59 (1991); Drmanac et al., Science, 260, 1649-1652 (1993); Lehrach et al., Genome Analysis: Genetic and Physical Mapping, 1, 39-81 (1990), Cold Spring Harbor Laboratory Press; Drmanac et al., Nucl. Acids Res., 4691 (1986); Stevanovic et al., Gene, 79, 139 (1989); Panusku et al., Mol. Biol. Evol., 1, 607 (1990); Nizetic et al., Nucl. Acids Res., 19, 182 (1991); Drmanac et al., J. Biomol. Struct. Dyn., 5, 1085 (1991); Hoheisel et al., Mol. Gen., 4, 125-132 (1991); Strezoska et al., Proc. Nat""l. Acad. Sci. (USA), 88, 10089 (1991); Drmanac et al., Nucl. Acids Res., 19, 5839 (1991); and Drmanac et al., Int. J. Genome Res., 1, 59-79 (1992).
The term xe2x80x9cexpression modulating fragment,xe2x80x9d EMF, means a series of nucleotide molecules which modulates the expression of an operably linked ORF or EMF.
As used herein, a sequence is said to xe2x80x9cmodulate the expression of an operably linked sequencexe2x80x9d when the expression of the sequence is altered by the presence of the EMF. EMFs include, but are not limited to, promoters, and promoter modulating sequences (inducible elements). One class of EMFs are fragments which induce the expression or an operably linked ORF in response to a specific regulatory factor or physiological event.
As used herein, an xe2x80x9cuptake modulating fragment,xe2x80x9d UMF, means a series of nucleotide molecules which mediate the uptake of a linked DNA fragment into a cell. UMFs can be readily identified using known UMFs as a target sequence or target motif with the computer-based systems described above.