First and second aspects of the present invention relates to instruments for performing biochemical analysis of a sample, for example sequencing of polynucleotides and/or biochemical analysis using nanopores, which produces output data of plural parallel channels representing the results of the biochemical analysis. The third aspect of the present invention relates to the performance of biochemical analysis of a sample using nanopores, for example sequencing of polynucleotides.
Regarding the first and second aspects of the present invention, there are many types of biochemical analysis that produces output data of plural parallel channels. Instruments for performing such biochemical analysis in an automated manner are known and provide efficiencies in the obtaining of large amounts of output data that are inherent in the biochemical analysis.
Merely by way of example, one such type of biochemical analysis that produces output data of plural parallel channels is DNA sequencing. Conventional DNA sequencing instruments, and laboratory instrumentation in general, are based on a model where an instrument operates as a standalone device. Typically, instruments perform one measurement task in finite time with a pre defined completion criterion. We can describe this design model as “monolithic”.
DNA sequencing, as an example, is an inherently high throughput laboratory technique. Experiments cover a wide variety of data sizes and durations and the data produced are very complex, heterogeneous and require intensive downstream processing. The nature of research around DNA sequencing makes it difficult to treat the core of the analysis, the instrument system, as a black box measuring device. There is an increasing need for scalable systems for DNA sequencing, capable of scaling both up and down. This is driven by a recent market demand to sequence more things, different things, and all more cheaply, quickly and effectively. Sequencing systems must therefore also be able to accommodate heterogeneous workflows and be able to pipeline samples of varying types and sizes in accordance with use-cases. This is desirably done efficiently and economically. Measurement artefacts associated with the substrate, or how it has been prepared, should not derail efficient processing on an instrument leading to redundant down-time or wasted reagents. Institutes that can operate efficient factory based sequencing processes will dominate low-cost and high throughput applications. However, these desires are difficult to achieve.
Current monolithic DNA sequencing instruments are difficult to scale to analysis at different scales. The instruments cannot be designed to suit very large factory operations, whilst at the same time being accessible to unskilled laboratory staff with smaller projects. Scalability for current DNA sequencing instruments generally comes from increasing the amount of data they can produce in a run, that is a single analysis performed by one instrument. However, modularity and flexibility is limited and in order to achieve it, the user has to resort to breaking the substrates down, making the substrates individually addressable by adding labels, and by breaking down the reaction chambers of the sequencers. In either case, artefacts are introduced and there are intrinsic limits on how much scale of modularity can be accomplished without a complete redesign of the instrument itself. In other words, the basic design of the instrument has a built in resource limit that hinders it ability to cope with the demands of real world workflows.
In many DNA sequencing instruments, individual strands or clonally amplified colonies of limited lengths of DNA are localised to a surface or to a bead. This surface/bead array is usually in a flow cell that enables reagents to be passed across them thus applying chemistries of various types that allow the DNA to be decoded. The biochemical analysis process within most instruments uses a stepwise cyclical chemistry, followed by an imaging stage to detect the incorporation, annealing or removal of chemically labelled fluorescent probes that enable the DNA under study to be decoded.
During the base identification stages, in most systems a high resolution imaging device takes pictures of the entire flow cell surface as a sequential series of tiled arrays of images. In some technologies, a single region is imaged very quickly detecting chemistry cycles in real time as bases are incorporated asynchronously.
Generally, in the case of sequential imaging of synchronous chemistry based systems, the entire imaging step takes a significant amount of time and generally has to complete a preset number of chemistry cycles, or preset run-time, before the user can take the data and analyse it, thereby judging if the experiment has been successful and yielded enough useful information. Generally, only following the analysis, can the user decide if the experiment has been successful, and if so, then an entirely new analysis run has to be performed, and this repeated until enough data of the required quality has been collected. In most cases each run has a fixed cost derived from the price of reagents. Hence the price of success is difficult to determine upfront as is the time-to-result.
For many instruments one run takes at least several days or often weeks with significant chance of failure by the instrument during the experiment, generally causing truncation or even complete loss of data. Higher outputs per run can be achieved by packing more DNA molecules into the flow-cell, however this tends to increase the time to take the images, depending on the device resolution and speed/sensitivity, with ultimately limited improvement in net throughput. For example, the company Helicos BioSciences market an instrument referred to as the Heliscope that has 600-800M DNA fragments attached to two flow cells, and the company Illumina market an instrument referred to as the Genome Analyser with 80M-100M DNA fragments. By way of comparison, it takes around 6 hours to incorporate and image a new base in every strand on the Heliscope compared to 1-2 hours per base on the Genome Analyser. Thus the two instruments are each best suited to tasks of different scales.
These vendors of such instrumentation have realised that users do not necessarily want a large output of data on one sample as this substantially reduces the modularity, flexibility and utility, and so typically physically divide up the surface area into individually addressable sections (e.g. 8 sub-channels, or ‘lanes’, on the flowcell for the Genome Analyser, 25 sub-channels per flow cell for the Heliscope, to enable the user to measure more than one sample per flow cell, albeit at concomitantly reduced data output per sample. One such area will still produce at least 250 Mb of DNA sequence, therefore generating a large over-sampling of a sample containing small genomes, for example a typical bacteria at 0.5 Mb would be covered at least 500 times. This example illustrates the inefficient utilisation of the instrumentation and reagents, both in terms of time and cost for the user.
For the user, one further problem experienced with existing instrumentation is that no matter how few fragments/strands of DNA/samples are required to be sequenced, throughput is tied to the cycle time of measuring across the entire flow cell surface. Current instruments have only one processing unit (the camera/flow cell surface) and cannot divide up the task of measuring each sample sufficiently to give the desired output for the user.
A further problem for the user is that he must pay for the time of the processing unit by way of the depreciation of the upfront costs of the instrument, as well as the costs of reagents across the entire surface in order to achieve his result, without knowing upfront if success is guaranteed in a run.
An specific example of a further compounding problem is that bases do not get added evenly during the biochemical analysis process to each available fragment (some fragments will happen to have a disproportionate amount of A's over C's for example, consist of repeating homopolymers), and are not always measured with even accuracy (dephasing of clusters, out-of focus areas on flow cell, enzyme/polymerase breakdown, background signal build up). This means that some areas of the flow cell will generate more data than others, but the nature of the single processing unit means that it cannot adapt to either maximise those areas that are generating useful and high quality information, or focus on areas that are failing to deliver sufficient data.
In summary, existing systems run for defined period of time and therefore cost, but produce information for a fixed number of bases for the user at variable measurement quality. The net result for the user is great inefficiencies in time and cost when performing different DNA sequencing experiments given the range of applications of interest to the user. This is particularly so when the user is trying to analyse, in parallel, multiple samples within a project on a given class of sequencing device.
Although a DNA sequencing instrument has been discussed as an example for illustration, difficulties of a similar nature may be encountered in designing instruments for a wide range of biochemical analysis that produces large amounts of output data of plural parallel channels.
The first and second aspects of the present invention seeks to alleviate some of these problems in scaling an instrument for performing biochemical analysis.
Regarding the third aspect of the present invention, in recent years there has been considerable development of biochemical analysis of a sample using nanopores. A nanopore is a small hole in an electrically insulating layer and may be formed, for example, by protein pores or channels introduced into an amphiphilic membrane. The nanopores may allow a flow of ions to travel across the amphiphilic membrane, modulated by the nanopore on the basis of an analyte interaction, thus allowing the nanopore to provide a biochemical analysis. Various types of nanopore and analysis apparatus for using them have been developed for a range of types of biochemical analysis. One example of commercial interest is to use nanopores for sequencing of polynucleotides such as DNA. One example of an analysis apparatus for performing biochemical analysis of a sample using nanopore is disclosed in WO-2009/077734.
As such nanopores offer the potential of a platform for biochemical analysis on a commercial scale. However, in such a context it would be desirable to provide efficient handling of samples in the apparatus in order to maximise throughput and minimise costs of performing the biochemical analysis.