This invention relates in general to methods and apparatus for nucleic acid analysis, and, in particular, to methods and apparati for nucleic acid analysis.
The rate of determining the sequence of the four nucleotides in nucleic acid samples is a major technical obstacle for further advancement of molecular biology, medicine, and biotechnology. Nucleic acid sequencing methods which involve separation of nucleic acid molecules in a gel have been in use since 1978. The other proven method for sequencing nucleic acids is sequencing by hybridization (SBH).
The traditional method of determining a sequence of nucleotides (i.e., the order of the A, G, C and T nucleotides in a sample) is performed by preparing a mixture of randomly-terminated, differentially labelled nucleic acid fragments by degradation at specific nucleotides, or by dideoxy chain termination of replicating strands. Resulting nucleic acid fragments in the range of 1 to 500 bp are then separated on a gel to produce a ladder of bands wherein the adjacent samples differ in length by one nucleotide.
The array-based approach of SBH does not require single base resolution in separation, degradation, synthesis or imaging of a nucleic acid molecule. Using mismatch discriminative hybridization of short oligonucleotides K bases in length, lists of constituent K-mer oligonucleotides may be determined for target nucleic acid. Sequence for the target nucleic acid may be assembled by uniquely overlapping scored oligonucleotides.
There are several approaches available to achieve sequencing by hybridization. In a process called SBH Format 1, nucleic acid samples are arrayed, and labeled probes are hybridized with the samples. Replica membranes with the same sets of sample nucleic acids may be used for parallel scoring of several probes and/or probes may be multiplexed. Nucleic acid samples may be arrayed and hybridized on nylon membranes or other suitable supports. Each membrane array may be reused many times. Format 1 is especially efficient for batch processing large numbers of samples.
In SBH Format 2, probes are arrayed at locations on a substrate which correspond to their respective sequences, and a labelled nucleic acid sample fragment is hybridized to the arrayed probes. In this case, sequence information about a fragment may be determined in a simultaneous hybridization reaction with all of the arrayed probes. For sequencing other nucleic acid fragments, the same oligonucleotide array may be reused. The arrays may be produced by spotting or by in situ synthesis of probes.
In Format 3 SBH, two sets of probes are used. In one embodiment, a set may be in the form of arrays of probes with known positions, and another, labelled set may be stored in multiwell plates. In this case, target nucleic acid need not be labelled. Target nucleic acid and one or more labelled probes are added to the arrayed sets of probes. If one attached probe and one labelled probe both hybridize contiguously on the target nucleic acid, they are covalently ligated, producing a detected sequence equal to the sum of the length of the ligated probes. The process allows for sequencing long nucleic acid fragments, e.g. a complete bacterial genome, without nucleic acid subcloning in smaller pieces.
In the present invention, SBH is applied to the efficient identification and sequencing of one or more nucleic acid samples. The procedure has many applications in nucleic acid diagnostics, forensics, and gene mapping. It also may be used to identify mutations responsible for genetic disorders and other traits, to assess biodiversity and to produce many other types of data dependent on nucleic acid sequence.
The present invention provides a method for detecting a target nucleic acid species including the steps of providing an array of probes affixed to a substrate and a plurality of labeled probes wherein each labeled probe is selected to have a first nucleic acid sequence which is complementary to a first portion of a target nucleic acid and wherein the nucleic acid sequence of at least one probe affixed to the substrate is complementary to a second portion of the nucleic acid sequence of the target, the second portion being adjacent to the first portion; applying a target nucleic acid to the array under suitable conditions for hybridization of probe sequences to complementary sequences; introducing a labeled probe to the array; hybridizing a probe affixed to the substrate to the target nucleic acid; hybridizing the labeled probe to the target nucleic acid; affixing the labeled probe to an adjacently hybridized probe in the array; and detecting the labeled probe affixed to the probe in the array. According to preferred methods of the invention the array of probes affixed to the substrate comprises a universal set of probes. According to other preferred aspects of the invention at least two of the probes affixed to the substrate define overlapping sequences of the target nucleic acid sequence and more preferably at least two of the labelled probes define overlapping sequences of the target nucleic acid sequences. Still further, according to another aspect of the invention a method is provided for detecting a target nucleic acid of known sequence comprising the steps of: contacting a nucleic acid sample with a set of immobilized oligonucleotide probes attached to a solid substrate under hybridizing conditions wherein the immobilized probes are capable of specific hybridization with different portions of said target nucleic acid sequence; contacting the target nucleic acid with a set of labelled oligonucleotide probes in solution under hybridizing conditions wherein the labeled probes are capable of specific hybridization with different portions of said target nucleic acid sequence adjacent to the immobilized probes; covalently joining the immobilized probes to labelled probes that are immediately adjacent to the immobilized probe on the target sequence (e.g., with ligase); removing any non-ligated labelled probes; detecting the presence of the target nucleic acid by detecting the presence of said labelled probe attached to the immobilized probes. The invention also provides a method of determining expression of a member of a set of partially or completely sequenced genes in a cell type, a tissue or a tissue mixture comprising the steps of: defining pairs of fixed and labeled probes specific for the sequenced gene; hybridizing unlabeled nucleic acid sample and corresponding labeled probes to one or more arrays of fixed probes; forming covalent bonds between adjacent hybridized labeled and fixed probes; removing unligated probes; and determining the presence of the sequenced gene by detection of labeled probes bound to prespecified locations in the array. In a preferred embodiment of this aspect of the invention, the target nucleic acid will identify the presence of an infectious agent.
Further, the present invention provides for an array of oligonucleotide probes comprising a nylon membrane; a plurality of subarrays of oligonucleotide probes on the nylon membrane, the subarrays comprising a plurality of individual spots wherein each spot is comprised of a plurality of oligonucleotide probes of the same sequence; and a plurality of hydrophobic barriers located between the subarrays on the nylon membrane, whereby the plurality of hyydrophobic barriers prevents cross contamination between adjacent subarrays.
Still further, the present invention provides a method for sequencing a repetitive sequence, having a first end and a second end, in a target nucleic acid comprising the steps of: (a) providing a plurality of spacer oligonucleotides of varying lengths wherein the spacer oligonucleotides comprise the repetitive sequence; (b) providing a first oligonucleotide that is known to be adjacent to the first end of the repetitive sequence; (c) providing a plurality of second oligonucleotides one of which is adjacent to the second end of the repetitive sequence, wherein the plurality of second oligonucleotides is labeled; (d) hybridizing the first and the plurality of second oligonucleotides, and one of the plurality of spacer oligonucleotides to the target nucleic acid ; (e) ligating the hybridized oligonucleotides; (f) separating ligated oligonucleotides from unligated oligonucleotides; and (g) detecting label in the ligated oligonucleotides.
Still further, the present invention provides a method for sequencing a branch point sequence, having a first end and a second end, in a target nucleic acid comprising the steps of: (a) providing a first oligonucleotide that is complementary to a first portion of the branch point sequence wherein the first oligonucleotide extends from the first end of the branch point sequence by at least one nucleotide; (b) providing a plurality of second oligonucleotides that are labeled, and are complementary to a second portion of the branch point sequence wherein the plurality of second oligonucleotides extend from the second end of the branch point sequence by at least one nucleotide, and wherein the portion of the second oligonucleotides that extend from the second end of the branch point sequence comprise sequences that are complementary to a plurality of sequences that arise from the branch point sequence; (c) hybridizing the first oligonucleotide, and one of the plurality of second oligonucleotides to the target DNA; (d) ligating the hybridized oligonucleotides; (e) separating ligated oligonucleotides from unligated oligonucleotides; and (f) detecting label in the ligated oligonucleotides.
Still further, the present invention provides a method for confirming a sequence by using probes that are predicted to be negative for the target nucleic acid. The sequence of a target is then confirmed by hybridizing the target nucleic acid to the xe2x80x9cnegativexe2x80x9d probes to confirm that these probes do not form perfect matches with the target nucleic acid.
Still further, the present invention provides a method for analyzing a nucleic acid using oligonucleotide probes that are complexed with different labels so that the,probes may be multiplexed in a hybridization reaction without a loss of sequence information (i.e., different probes have different labels so that hybridization of the different probes to the target can be distinguished). In a preferred embodiment, the labels are radioisotopes, or flourescent molecules, or enzymes, or electrophore mass labels. In a more preferred embodiment, the differently labeled oligonucleotides probes are used in format III SBH, and multiple probes (more than two, with one probe being the immobilized probe) are ligated together.
Still further, the present invention provides a method for detecting the presence of a target nucleic acid having a known sequence when the target is present in very small amounts compared to homologous nucleic acids in a sample. In a preferred embodiment, the target nucleic acid is an allele present at very low frequency in a sample that has nucleic acids from a large number of sources. In an alternative preferred embodiment, the target nucleic acid has a mutated sequence, and is present at very low frequency within a sample of nucleic acids.
Still further, the present invention provides a method for confirming the sequence of a target nucleic acid by using single pass gel sequencing. Primers for single pass gel sequencing are derived from the sequence obtained by SBH, and these primers are used in standard Sanger sequencing reactions to provide gel sequence information for the target nucleic acid. The sequence obtained by single pass gel sequencing is then compared to the SBH derived sequence to confirm the sequence.
Still further, the present invention provides a method for solving branch points by using single pass gel sequencing. Primers for the single pass gel sequencing reactions are identified from the ends of the Sfs obtained after a first round of SBH sequencing, and these primers are used in standard Sanger-sequencing reactions to provide gel sequencing information through the branch points of the Sfs. Sfs are then aligned by comparing the Sanger-sequencing results through the branch points to the Sfs to identify adjoining Sfs.
Still further, the present invention provides for a method of preparing a sample containing target nucleic acids by PCR, without purifying the PCR products prior to the SBH reactions. In Format I SBH, crude PCR products are applied to a substrate without prior purification, and the substrate may be washed prior to introduction of the labeled probes.
Still further, the present invention provides a method and an apparatus for analyzing a target nucleic acid. The apparatus comprises two arrays of nucleic acids that are mixed together at the desired time. In a preferred embodiment, the nucleic acids in one of the arrays are labeled. In a more preferred embodiment, a material is disposed between the two arrays and this material prevents the mixing of nucleic acids in the arrays. When this material is removed, or rendered permeable, the nucleic acids in the two arrays are mixed together. In an alternative preferred embodiment, the nucleic acids in one array are target nucleic acids and the nucleic acids in the other are oligonucleotide probes. In another preferred embodiment, the nucleic acids in both arrays are oligonucleotide probes. In another preferred embodiment, the nucleic acids in one array are oligonucleotide probes and target nucleic acids, and nucleic acids in the other array are oligonucleotide probes. In another preferred embodiment, the nucleic acids in both arrays are oligonucleotide probes and target nucleic acids.
One method of the present invention using the apparatus described above comprises the steps of providing an array of nucleic acids fixed to a substrate, providing a second array of nucleic acids, providing conditions that allow the nucleic acids in the second array to come into contact with the nucleic acids of the fixed array wherein one of the arrays of nucleic acids are target nucleic acids and the other array is oligonucleotide probes, and analyzing the hybridization results. In a preferred embodiment, the fixed array is target nucleic acid and the second array is labeled oligonucleotide probes. In a more preferred embodiment, there is a material disposed between the two arrays that prevents mixing of the nucleic acids until the material is removed or rendered permeable to the nucleic acids.
In a second method of the present invention using the apparatus described above comprises the steps of providing two arrays of nucleic acid probes, providing conditions that allow the two arrays of probes to come into contact with each other and a target nucleic acid, ligating together probes that are adjacent on the target nucleic acid, and analyzing the results. In a preferred embodiment, the probes in one array are fixed and the probes in the other array are labeled. In a more preferred embodiment, there is a material disposed between the two arrays that prevents mixing of the probes until the material is removed or rendered permeable to the probes.
Still further, the present invention provides substrates on which arrays of oligonucleotide probes are fixed, wherein each probe is separated from its neighboring probes by a physical barrier that is resistant to the flow of the sample solution. In a preferred embodiment, the physical barrier is made of a hydrophobic material.
Still further, the present invention provides a method for making the arrays of oligonucleotide probes that are separated by physical barriers. In a preferred embodiment, a grid is applied to the substrate using an ink-jet head that applies a material which reduces the reaction volume of the array.
Still further, the present invention provides substrates on which oligonucleotides are fixed to form a three-dimensional array. The three-dimensional array combines high resolution for reading probe results (each level has a relatively low density of probes per cm 2), with high information content in three dimensional space (multiple levels or probes).
Still further, the present invention provides a substrate to which oligonucleotide probes are fixed, wherein the oligonucleotide probes have spacers, and wherein the spacers increase the distance between the substrate and the informational portion of the oligonucleotide probe (e.g., the portion of the oligonucleotide probe which binds to the target and gives sequence information). In a preferred embodiment, the spacer comprises ribose sugars and phosphates, wherein the phosphates covalently bind the ribose sugars into a polymer by forming esters with the ribose sugars through their 5xe2x80x2 and 3xe2x80x2 hydroxyl groups.
Still further, the present invention provides a method for clustering cDNA clones into groups of similar or identical sequences, so that single representative clones may be selected from each group for sequencing. In a preferred embodiment, the method for clustering is used in the sequencing of a plurality of clones, comprising the steps of: interrogating each clone with a plurality of oligonucleotide probes; determining which probes bind to each clone and the signal intensity for each probe; clustering clones into a plurality of groups by identifying clones that bind to similar probes with similar intensities; and sequencing at least one clone from each group. In a more preferred embodiment, the plurality of probes comprises from about 50 to about 500 different probes. In a another more preferred embodiment, the plurality of probe comprises about 300 different probes. In a most preferred embodiment, the plurality of clones are a plurality of cDNA clones.