Specific hybridization of oligonucleotides and their analogs is a fundamental process that is employed in a wide variety of research, medical, and industrial applications, including the identification of disease-related polynucleotides in diagnostic assays, screening for clones of novel target polynucleotides, identification of specific polynucleotides in blots of mixtures of polynucleotides, therapeutic blocking of inappropriately expressed genes and DNA sequencing. Sequence specific hybridization is critical in the development of high throughput multiplexed nucleic acid assays. As formats for these assays expand to encompass larger amounts of sequence information acquired through projects such as the Human Genome project, the challenge of sequence specific hybridization with high fidelity is becoming increasingly difficult to achieve.
In large part, the success of hybridization using oligonucleotides depends on minimizing the number of false positives and false negatives. Such problems have made the simultaneous use of multiple hybridization probes in a single experiment i.e. multiplexing, particularly in the analysis of multiple gene sequences on a gene microarray, very difficult. For example, in certain binding assays, a number of nucleic acid molecules are bound to a chip with the desire that a given “target” sequence will bind selectively to its complement attached to the chip. Approaches have been developed that involve the use of oligonucleotide tags attached to a solid support that can be used to specifically hybridize to the tag complements that are coupled to probe sequences. Chetverin et al. (WO 93/17126) uses sectioned, binary oligonucleotide arrays to sort and survey nucleic acids. These arrays have a constant nucleotide sequence attached to an adjacent variable nucleotide sequence, both bound to a solid support by a covalent linking moiety. These binary arrays have advantages compared with ordinary arrays in that they can be used to sort strands according to their terminal sequences so that each strand binds to a fixed location on an array. The design of the terminal sequences in this approach comprises the use of constant and variable sequences. U.S. Pat. Nos. 6,103,463 and 6,322,971 issued to Chetverin et al. on Aug. 15, 2000 and Nov. 27, 2001, respectively.
This concept of using molecular tags to sort a mixture of molecules is analogous to molecular tags developed for bacterial and yeast genetics (Hensel et al., Science; 269, 400-403: 1995 and Schoemaker et al., Nature Genetics; 14, 450-456: 1996). Here, a method termed “signature tagged” mutagenesis in which each mutant is tagged with a different DNA sequence is used to recover mutant genes from a complex mixture of approximately 10,000 bacterial colonies. In the tagging approach of Barany et al. (WO 9731256), known as the “zip chip”, a family of nucleic acid molecules, the “zip-code addresses”, each different from each other, are set out on a grid. Target molecules are attached to oligonucleotide sequences complementary to the “zipcode addresses,” referred to as “zipcodes,” which are used to specifically hybridize to the address locations on the grid. While the selection of these families of polynucleotide sequences used as addresses is critical for correct performance of the assay, the performance has not been described.
Working in a highly parallel hybridization environment requiring specific hybridization imposes very rigorous selection criteria for the design of families of oligonucleotides that are to be used. The success of these approaches is dependent on the specific hybridization of a probe and its complement. Problems arise as the family of nucleic acid molecules cross-hybridize or hybridize incorrectly to the target sequences. While it is common to obtain incorrect hybridization resulting in false positives or an inability to form hybrids resulting in false negatives, the frequency of such results must be minimized. In order to achieve this goal certain thermodynamic properties of forming nucleic acid hybrids must be considered. The temperature at which oligonucleotides form duplexes with their complementary sequences known as the Tm (the temperature at which 50% of the nucleic acid duplex is dissociated) varies according to a number of sequence dependent properties including the hydrogen bonding energies of the canonical pairs A-T and G-C (reflected in GC or base composition), stacking free energy and, to a lesser extent, nearest neighbour interactions. These energies vary widely among oligonucleotides that are typically used in hybridization assays. For example, hybridization of two probe sequences composed of 24 nucleotides, one with a 40% GC content and the other with a 60% GC content, with its complementary target under standard conditions theoretically may have a 10° C. difference in melting temperature (Mueller et al., Current Protocols in Mol. Biol.; 15, 5: 1993). Problems in hybridization occur when the hybrids are allowed to form under hybridization conditions that include a single hybridization temperature that is not optimal for correct hybridization of all oligonucleotide sequences of a set. Mismatch hybridization of non-complementary probes can occur forming duplexes with measurable mismatch stability (Peyret et al., Biochemistry; 38: 3468-77, 1999). Mismatching of duplexes in a particular set of oligonucleotides can occur under hybridization conditions where the mismatch results in a decrease in duplex stability that results in a higher Tm than the least stable correct duplex of that particular set. For example, if hybridization is carried out under conditions that favor the AT-rich perfect match duplex sequence, the possibility exists for hybridizing a GC-rich duplex sequence that contains a mismatched base having a melting temperature that is still above the correctly formed AT-rich duplex. Therefore design of families of oligonucleotide sequences that can be used in multiplexed hybridization reactions must include consideration for the thermodynamic properties of oligonucleotides and duplex formation that will reduce or eliminate cross hybridization behavior within the designed oligonucleotide set.
A phantom sequence may thus be generated from exemplary Sequence 1 and Sequence 2 as follows:
Sequence 1:ATGTTTAGTGAAAAGTTAGTATTG(SEQ ID   *        •NO: 211) Sequence 2:ATGTTAGTGAATAGTATAGTATTG(SEQ ID           •   ♦NO: 212) PhantomATGTTAGTGAAAGTTAGTATTG(SEQ IDSequence:NO: 215)
The phantom sequence generated from these two sequences is thus 22 bases in length. That is, one can see that there are 22 identical bases with identical sequence (the same order) in Sequence Nos. 1 and 2. There is a total of three insertions/deletions and mismatches present in the phantom sequence when compared with the sequences from which it was generated:
ATGT-TAGTGAA-AGT-TAGTATTG(SEQ ID NO: 215)The dashed lines in this latter representation of the phantom sequence indicate the locations of the insertions/deletions and mismatches in the phantom sequence relative to the parent sequences from which it was derived. Thus, the “T” marked with an asterisk in Sequence 1, the “A” marked with a diamond in Sequence 2 and the “A-T” mismatch of Sequences 1 and 2 marked with two dots were deleted in generating the phantom sequence.
A multiplex sequencing method has been described in U.S. Pat. No. 4,942,124, which issued to Church on Jul. 17, 1990. The method requires at least two vectors which differ from each other at a tag sequence. It is stated in the specification that a tag sequence in one vector will not hybridize under stringent hybridization conditions to a tag sequence in another vector, i.e. a complementary probe of a tag in one vector does not cross-hybridize with a tag sequence in another vector. Exemplary stringent hybridization conditions are given as 42° C. in 500-1000 mM sodium phosphate buffer. A set of 42 20-mer tag sequences, all of which lack G residues, is given in FIG. 3 of Church's specification. Details of how the sequences were obtained are not provided, although Church states that initially 92 were chosen on the basis of their having sufficient sequence diversity to insure uniqueness.
There have been other attempts at the development of families of tags. There are a number of different approaches for selecting sequences for use in multiplexed hybridization assays. The selection of sequences that can be used as zipcodes or tags in an addressable array has been described in the patent literature in an approach taken by Brenner and co-workers. U.S. Pat. No. 5,654,413 describes a population of oligonucleotide tags (and corresponding tag complements) in which each oligonucleotide tag includes a plurality of subunits, each subunit consisting of an oligonucleotide having a length of from three to six nucleotides and each subunit being selected from a minimally cross hybridizing set, wherein a subunit of the set would have at least two mismatches with any other sequence of the set. Table II of the Brenner patent specification describes exemplary groups of 4 mer subunits that are minimally cross hybridizing according to the aforementioned criteria. In the approach taken by Brenner, constructing non cross-hybridizing oligonucleotides, relies on the use of subunits that form a duplex having at least two mismatches with the complement of any other subunit of the same set. The ordering of subunits in the construction of oligonucleotide tags is not specifically defined.
Parameters used in the design of tags based on subunits are discussed in Barany et al. (WO 9731256). For example, in the design of polynucleotide sequences that are for example 24 nucleotides in length (24 mer) derived from a set of four possible tetramers in which each 24 mer “address” differs from its, nearest 24 mer neighbour by 3 tetramers. They discuss further that, if each tetramer differs from each other by at least two nucleotides, then each 24 mer will differ from the next by at least six nucleotides. This is determined without consideration for insertions or deletions when forming the alignment between any two sequences of the set. In this way a unique “zip code” sequence is generated. The zip code is ligated to a label in a target dependent manner, resulting in a unique “zip code” which is then allowed to hybridize to its address on the chip. To minimize cross-hybridization of a “zip code” to other “addresses”, the hybridization reaction is carried out at temperatures of 75-80° C. Due to the high temperature conditions for hybridization, 24 mers that have partial homology hybridize to a lesser extent than sequences with perfect complementarity and represent ‘dead zones’. This approach of implementing stringent hybridization conditions for example, involving high temperature hybridization, is also practiced by Brenner et. al.
The current state of technology for designing non-cross hybridizing tags based on subunits does not provide sufficient guidance to construct a family of sequences with practical value in assays that require stringent non-cross hybridizing behavior.
Thus, while it is desirable with such arrays to have, at once, a large number of address molecules, the address molecules should each be highly selective for its own complement sequence. While such an array provides the advantage that the family of molecules making up the grid is entirely of design, and does not rely on sequences as they occur in nature, the provision of a family of molecules, which is sufficiently large and where each individual member is sufficiently selective for its complement over all the other zipcode molecules (i.e., where there is sufficiently low cross-hybridization, or cross-talk) continues to elude researchers.