This invention relates to fabricated arrays of molecules, and to their analytical applications. In particular, this invention relates to the use of fabricated arrays in methods for obtaining genetic sequence information
Advances in the study of molecules have been led, in part, by improvement in technologies used to characterise the molecules or their biological reactions. In particular, the study of nucleic acids, DNA and RNA, has benefited from developing technologies used for sequence analysis and the study of hybridisation events.
An example of the technologies that have improved the study of nucleic acids, is the development of fabricated arrays of immobilised nucleic acids. These arrays typically consist of a high-density matrix of polynucleotides immobilized onto a solid support material Fodor et al., Trends in Biotechnology (1994) 12.19-26, describes ways of assembling the nucleic acid arrays using a chemically sensitised glass surface protected by a mask, but exposed at defined areas to allow attachment of suitably modified nucleotides Typically, these arrays may be described as xe2x80x9cmany moleculexe2x80x9d arrays, as distinct regions are formed on the solid support comprising a high density of one specific type of polynucleotide.
An alternative approach is described by Schena et al., Science (1995) 270.467-470, where samples of DNA are positioned at predetermined sites on a glass microscope slide by robotic micropipetting techniques. The DNA is attached to the glass surface along its entire length by non-covalent electrostatic interactions. However, although hybridisation with complementary DNA sequences can occur, this approach may not permit the DNA to be freely available for interacting with other components such as polymerase enzymes, DNA-binding proteins etc.
Recently, the Human Genome Project determined the entire sequence of the human genome-all 3xc3x97109 bases. The sequence information represents that of an average human However, there is still considerable interest in identifying differences in the genetic sequence between different individuals. The most common form of genetic variation is single nucleotide polymorphisms (SNPs) On average one base in 1000 is a SNP, which means that there are 3 million SNPs for any individual. Some of the SNPs are in coding regions and produce proteins with different binding affinities or properties. Some are in regulatory regions and result in a different response to changes in levels of metabolites or messengers. SNPs are also found in noncoding regions, and these are also important as they may correlate with SNPs in coding or regulatory regions. The key problem is to develop a low cost way of determining one or more of the SNPs for an individual
The nucleic acid arrays may be used to determine SNPs, and they have been used to study hybridisation events (Mirzabekov, Trends in Biotechnology (1994) 12.27-32) Many of these hybidisation events are detected using fluorescent labels attached to nucleotides, the labels being detected using a sensitive fluorescent detector, e.g. a charge-coupled detector (CCD). The major disadvantages of these methods are that it is not possible to sequence long stretches of DNA, and that repeat sequences can lead to ambiguity in the results. These problems are recognised in Automation Technologies for Genome Characterisation, Wiley-Interscience (1997), ed T J Beugelsdijk, Chapter 10 205-225
In addition, the use of high-density arrays in a multi-step analysis procedure can lead to problems with phasing. Phasing problems result from a loss in the synchronisation of a reaction step occurring on different molecules of the array if some of the arrayed molecules fail to undergo a step in the procedure, subsequent results obtained for these molecules will no longer be in step with results obtained for the other arrayed molecules. The proportion of molecules out of phase will increase through successive steps and consequently the results detected will become ambiguous. This problem is recognized in the sequencing procedure described in U.S. Pat. No. 5,302,509.
An alternative sequencing approach is disclosed in EP-A-0381693, which comprises hybridising a fluorescently-labelled strand of DNA to a target DNA sample suspended in a flowing sample stream, and then using an exonuclease to cleave repeatedly the end base from the hybridised DNA. The cleaved bases are detected in sequential passage through a detector, allowing reconstruction of the base sequence of the DNA. Each of the different nucleotides has a distinct fluorescent label attached which is detected by laser-induced fluorescence. This is a complex method, primarily because it is difficult to ensure that every nucleotide of the DNA strand is labelled and that this has been achieved with high fidelity to the original sequence
WO-A-96/27025 is a general disclosure of single molecule arrays. Although sequencing procedures are disclosed, there is little description of the applications to which the arrays can be applied. There is also only a general discussion on how to prepare the arrays
According to the present invention, a device comprises a high density array of molecules capable of interrogation and immobilised on a solid generally planar source, wherein the array allows the molecules to be individually resolved by optical microscopy, and wherein each molecule is immobilised by covalent bonding to the surface, other than at that part of each molecule that can be interrogated.
According to a second aspect of the invention, a device comprises a high density array of relatively short molecules and relatively long polynucleotides immobilised on the surface of a solid support, wherein the polynucleotides are at a density that permits individual resolution of those parts that extend beyond the relatively short molecules in this aspect, the shorter molecules can prevent non-specific binding of reagents to the solid support, and therefore reduce background interference
According to a third aspect of the invention, a device comprises an array of polynucleotide molecules immobilised on a solid surface, wherein each molecule comprises a polynucleotide duplex linked via a covalent bond to form a hairpin loop structure, one end of which comprises a target polynucleotide, and the array has a surface density which allows the target polynucleotides to be individually resolved in this aspect, the hairpin structures act to tether the target to a primer polynucleotide. This prevents loss of the primer-target during the washing steps of a sequencing procedure. The hairpins may therefore improve the efficiency of the sequencing procedures.
The arrays of the present invention comprise what are effectively single molecules. This has many important benefits for the study of the molecules and their interaction with other biological molecules. In particular, fluorescence events occuring on each molecule can be detected using an optical microscope linked to a sensitive detector, resulting in a distinct signal for each molecule
When used in a multi-step analysis of a population of single molecules, the phasing problems that are encountered using high density (multi-molecule) arrays of the prior art, can be reduced or removed. Therefore, the arrays also permit a massively parallel approach to monitoring fluorescent or other events on the molecules. Such massively parallel data acquisition makes the arrays extremely useful in a wide range of analysis procedures which involve the screening/characterising of heterogeneous mixtures of molecules. The arrays can be used to characterise a particular synthetic chemical or biological moiety, for example in screening for particular molecules produced in combinatorial synthesis reactions.
The arrays of the present invention are particularly suitable for use with polynucleotides as the molecular species. The preparation of the arrays requires only small amounts of polynucleotide sample and other reagents, and can be carried out by simple means Polynucleotide arrays according to the invention permit massively parallel sequencing chemistries to be performed. For example, the arrays permit simultaneous chemical reactions on and analysis of many individual polynucleotide molecules. The arrays are therefore very suitable for determining polynucleotide sequences.
An array of the invention may also be used to generate a spatial addressable array of single polynucleotide molecules. This is the simple consequence of sequencing the array. Particular advantages of such a spatially addressable array include the following
1) Polynucleotide molecules on the array may act as identifier tags and may only need to be 10-20 bases long, and the efficiency required in the sequencing steps may only need to be better than 50%, as there will be no phasing problems
2) The arrays may be reusable for screening once created and sequenced. All possible sequences can be produced in a very simple way, e.g. compared to a high density multi-molecule DNA chip made using photolithography