Advances in the study of molecules have been led, in part, by improvement in technologies used to characterise the molecules or their biological reactions. In particular, the study of nucleic acids, DNA and RNA, has benefited from developing technologies used for sequence analysis and the study of hybridisation events.
An example of the technologies that have improved the study of nucleic acids, is the development of fabricated arrays of immobilised nucleic acids. These arrays typically consist of a high-density matrix of polynucleotides immobilized onto a solid support material Fodor et al., Trends in Biotechnology (1994) 12.19–26, describes ways of assembling the nucleic acid arrays using a chemically sensitised glass surface protected by a mask, but exposed at defined areas to allow attachment of suitably modified nucleotides. Typically, these arrays may be described as “many molecule” arrays, as distinct regions are formed on the solid support comprising a high density of one specific type of polynucleotide.
An alternative approach is described by Schena et al., Science (1995) 270.467–470, where samples of DNA are positioned at predetermined sites on a glass microscope slide by robotic micropipetting techniques. The DNA is attached to the glass surface along its entire length by non-covalent electrostatic interactions. However, although hybridisation with complementary DNA sequences can occur, this approach may not permit the DNA to be freely available for interacting with other components such as polymerase enzymes, DNA-binding proteins etc.
Recently, the Human Genome Project determined the entire sequence of the human genome-all 3×109 bases. The sequence information represents that of an average human. However, there is still considerable interest in identifying differences in the genetic sequence between different individuals. The most common form of genetic variation is single nucleotide polymorphisms (SNPs). On average one base in 1000 is a SNP, which means that there are 3 million SNPs for any individual. Some of the SNPs are in coding regions and produce proteins with different binding affinities or properties. Some are in regulatory regions and result in a different response to changes in levels of metabolites or messengers. SNPs are also found in noncoding regions, and these are also important as they may correlate with SNPs in coding or regulatory regions. The key problem is to develop a low cost way of determining one or more of the SNPs for an individual.
The nucleic acid arrays may be used to determine SNPs, and they have been used to study hybridisation events (Mirzabekov, Trends in Biotechnology (1994) 12.27–32). Many of these hybidisation events are detected using fluorescent labels attached to nucleotides, the labels being detected using a sensitive fluorescent detector, e.g. a charge-coupled detector (CCD). The major disadvantages of these methods are that it is not possible to sequence long stretches of DNA, and that repeat sequences can lead to ambiguity in the results. These problems are recognised in Automation Technologies for Genome Characterisation, Wiley-Interscience (1997), ed T J Beugelsdijk, Chapter 10 205–225.
In addition, the use of high-density arrays in a multi-step analysis procedure can lead to problems with phasing. Phasing problems result from a loss in the synchronisation of a reaction step occurring on different molecules of the array. If some of the arrayed molecules fail to undergo a step in the procedure, subsequent results obtained for these molecules will no longer be in step with results obtained for the other arrayed molecules. The proportion of molecules out of phase will increase through successive steps and consequently the results detected will become ambiguous. This problem is recognized in the sequencing procedure described in U.S. Pat. No. 5,302,509.
An alternative sequencing approach is disclosed in EP-A-0381693, which comprises hybridising a fluorescently-labelled strand of DNA to a target DNA sample suspended in a flowing sample stream, and then using an exonuclease to cleave repeatedly the end base from the hybridised DNA. The cleaved bases are detected in sequential passage through a detector, allowing reconstruction of the base sequence of the DNA. Each of the different nucleotides has a distinct fluorescent label attached which is detected by laser-induced fluorescence. This is a complex method, primarily because it is difficult to ensure that every nucleotide of the DNA strand is labelled and that this has been achieved with high fidelity to the original sequence.
WO-A-96/27025 is a general disclosure of single molecule arrays. Although sequencing procedures are disclosed, there is little description of the applications to which the arrays can be applied. There is also only a general discussion on how to prepare the arrays.